<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Integration</h1>
<ul class="simple">
<li><a class="reference internal" href="#reverseproxy"><span class="std std-ref">Reverse Proxy</span></a></li>
<li><a class="reference internal" href="#azure"><span class="std std-ref">Azure: Microsoft Azure</span></a></li>
<li><a class="reference internal" href="#aws"><span class="std std-ref">AWS: Amazon Web Services</span></a></li>
<li><a class="reference internal" href="#databricks"><span class="std std-ref">Databricks</span></a></li>
<li><a class="reference internal" href="#gcp"><span class="std std-ref">GCP: Google Cloud Platform</span></a></li>
</ul>
<div class="section" id="reverse-proxy">
<span id="reverseproxy"></span><h2 class="sigil_not_in_toc">Reverse Proxy</h2>
<p>Airflow can be set up behind a reverse proxy, with the ability to set its endpoint with great
flexibility.</p>
<p>For example, you can configure your reverse proxy to get:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">lab</span><span class="o">.</span><span class="n">mycompany</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">myorg</span><span class="o">/</span><span class="n">airflow</span><span class="o">/</span>
</pre>
</div>
</div>
<p>To do so, you need to set the following setting in your <cite>airflow.cfg</cite>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">base_url</span> <span class="o">=</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">my_host</span><span class="o">/</span><span class="n">myorg</span><span class="o">/</span><span class="n">airflow</span>
</pre>
</div>
</div>
<p>Additionally if you use Celery Executor, you can get Flower in <cite>/myorg/flower</cite> with:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">flower_url_prefix</span> <span class="o">=</span> <span class="o">/</span><span class="n">myorg</span><span class="o">/</span><span class="n">flower</span>
</pre>
</div>
</div>
<p>Your reverse proxy (ex: nginx) should be configured as follow:</p>
<ul>
<li><p class="first">pass the url and http header as it for the Airflow webserver, without any rewrite, for example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>server {
  listen 80;
  server_name lab.mycompany.com;

  location /myorg/airflow/ {
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection &quot;upgrade&quot;;
  }
}
</pre>
</div>
</div>
</li>
<li><p class="first">rewrite the url for the flower endpoint:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>server {
    listen 80;
    server_name lab.mycompany.com;

    location /myorg/flower/ {
        rewrite ^/myorg/flower/(.*)$ /$1 break;  # remove prefix from http header
        proxy_pass http://localhost:5555;
        proxy_set_header Host $host;
        proxy_redirect off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection &quot;upgrade&quot;;
    }
}
</pre>
</div>
</div>
</li>
</ul>
</div>
<div class="section" id="azure-microsoft-azure">
<span id="azure"></span><h2 class="sigil_not_in_toc">Azure: Microsoft Azure</h2>
<p>Airflow has limited support for Microsoft Azure: interfaces exist only for Azure Blob
Storage and Azure Data Lake. Hook, Sensor and Operator for Blob Storage and
Azure Data Lake Hook are in contrib section.</p>
<div class="section" id="azure-blob-storage">
<h3 class="sigil_not_in_toc">Azure Blob Storage</h3>
<p>All classes communicate via the Window Azure Storage Blob protocol. Make sure that a
Airflow connection of type <cite>wasb</cite> exists. Authorization can be done by supplying a
login (=Storage account name) and password (=KEY), or login and SAS token in the extra
field (see connection <cite>wasb_default</cite> for an example).</p>
<ul class="simple">
<li><a class="reference internal" href="#wasbblobsensor"><span class="std std-ref">WasbBlobSensor</span></a>: Checks if a blob is present on Azure Blob storage.</li>
<li><a class="reference internal" href="#wasbprefixsensor"><span class="std std-ref">WasbPrefixSensor</span></a>: Checks if blobs matching a prefix are present on Azure Blob storage.</li>
<li><a class="reference internal" href="#filetowasboperator"><span class="std std-ref">FileToWasbOperator</span></a>: Uploads a local file to a container as a blob.</li>
<li><a class="reference internal" href="#wasbhook"><span class="std std-ref">WasbHook</span></a>: Interface with Azure Blob Storage.</li>
</ul>
<div class="section" id="wasbblobsensor">
<span id="id1"></span><h4 class="sigil_not_in_toc">WasbBlobSensor</h4>

<pre>
class airflow.contrib.sensors.wasb_sensor.WasbBlobSensor(container_name, blob_name, wasb_conn_id=&apos;wasb_default&apos;, check_options=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.sensors.base_sensor_operator.BaseSensorOperator" title="airflow.sensors.base_sensor_operator.BaseSensorOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.sensors.base_sensor_operator.BaseSensorOperator</span></code></a></p>
<p>Waits for a blob to arrive on Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>wasb_conn_id</strong> (<em>str</em>) &#x2013; Reference to the wasb connection.</li>
<li><strong>check_options</strong> (<em>dict</em>) &#x2013; Optional keyword arguments that
<cite>WasbHook.check_for_blob()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>
poke(context)</pre>
<p>Function that the sensors defined while deriving this class should
override.</p>






</div>
<div class="section" id="wasbprefixsensor">
<span id="id2"></span><h4 class="sigil_not_in_toc">WasbPrefixSensor</h4>

<pre>
class airflow.contrib.sensors.wasb_sensor.WasbPrefixSensor(container_name, prefix, wasb_conn_id=&apos;wasb_default&apos;, check_options=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.sensors.base_sensor_operator.BaseSensorOperator" title="airflow.sensors.base_sensor_operator.BaseSensorOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.sensors.base_sensor_operator.BaseSensorOperator</span></code></a></p>
<p>Waits for blobs matching a prefix to arrive on Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>prefix</strong> (<em>str</em>) &#x2013; Prefix of the blob.</li>
<li><strong>wasb_conn_id</strong> (<em>str</em>) &#x2013; Reference to the wasb connection.</li>
<li><strong>check_options</strong> (<em>dict</em>) &#x2013; Optional keyword arguments that
<cite>WasbHook.check_for_prefix()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>
poke(context)</pre>
<p>Function that the sensors defined while deriving this class should
override.</p>






</div>
<div class="section" id="filetowasboperator">
<span id="id3"></span><h4 class="sigil_not_in_toc">FileToWasbOperator</h4>

<pre>
class airflow.contrib.operators.file_to_wasb.FileToWasbOperator(file_path, container_name, blob_name, wasb_conn_id=&apos;wasb_default&apos;, load_options=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Uploads a file to Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>file_path</strong> (<em>str</em>) &#x2013; Path to the file to load. (templated)</li>
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container. (templated)</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob. (templated)</li>
<li><strong>wasb_conn_id</strong> (<em>str</em>) &#x2013; Reference to the wasb connection.</li>
<li><strong>load_options</strong> (<em>dict</em>) &#x2013; Optional keyword arguments that
<cite>WasbHook.load_file()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>
execute(context)</pre>
<p>Upload a file to Azure Blob Storage.</p>






</div>
<div class="section" id="wasbhook">
<span id="id4"></span><h4 class="sigil_not_in_toc">WasbHook</h4>

<pre>
class airflow.contrib.hooks.wasb_hook.WasbHook(wasb_conn_id=&apos;wasb_default&apos;)</pre>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.hooks.base_hook.BaseHook</span></code></p>
<p>Interacts with Azure Blob Storage through the wasb:// protocol.</p>
<p>Additional options passed in the &#x2018;extra&#x2019; field of the connection will be
passed to the <cite>BlockBlockService()</cite> constructor. For example, authenticate
using a SAS token by adding {&#x201C;sas_token&#x201D;: &#x201C;YOUR_TOKEN&#x201D;}.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>wasb_conn_id</strong> (<em>str</em>) &#x2013; Reference to the wasb connection.</td>
</tr>
</tbody>
</table>

<pre>
check_for_blob(container_name, blob_name, **kwargs)</pre>
<p>Check if a blob exists on Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.exists()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">True if the blob exists, False otherwise.</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype bool</p>




<pre>
check_for_prefix(container_name, prefix, **kwargs)</pre>
<p>Check if a prefix exists on Azure Blob storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>prefix</strong> (<em>str</em>) &#x2013; Prefix of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.list_blobs()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">True if blobs matching the prefix exist, False otherwise.</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype bool</p>




<pre>
get_conn()</pre>
<p>Return the BlockBlobService object.</p>




<pre>
get_file(file_path, container_name, blob_name, **kwargs)</pre>
<p>Download a file from Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>file_path</strong> (<em>str</em>) &#x2013; Path to the file to download.</li>
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.create_blob_from_path()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_file(file_path, container_name, blob_name, **kwargs)</pre>
<p>Upload a file to Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>file_path</strong> (<em>str</em>) &#x2013; Path to the file to load.</li>
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.create_blob_from_path()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_string(string_data, container_name, blob_name, **kwargs)</pre>
<p>Upload a string to Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>string_data</strong> (<em>str</em>) &#x2013; String to load.</li>
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.create_blob_from_text()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
read_file(container_name, blob_name, **kwargs)</pre>
<p>Read a file from Azure Blob Storage and return as a string.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>container_name</strong> (<em>str</em>) &#x2013; Name of the container.</li>
<li><strong>blob_name</strong> (<em>str</em>) &#x2013; Name of the blob.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>BlockBlobService.create_blob_from_path()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
</div>
<div class="section" id="azure-file-share">
<h3 class="sigil_not_in_toc">Azure File Share</h3>
<p>Cloud variant of a SMB file share. Make sure that a Airflow connection of
type <cite>wasb</cite> exists. Authorization can be done by supplying a login (=Storage account name)
and password (=Storage account key), or login and SAS token in the extra field
(see connection <cite>wasb_default</cite> for an example).</p>
<div class="section" id="azurefilesharehook">
<h4 class="sigil_not_in_toc">AzureFileShareHook</h4>

<pre>
class airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook(wasb_conn_id=&apos;wasb_default&apos;)</pre>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.hooks.base_hook.BaseHook</span></code></p>
<p>Interacts with Azure FileShare Storage.</p>
<p>Additional options passed in the &#x2018;extra&#x2019; field of the connection will be
passed to the <cite>FileService()</cite> constructor.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>wasb_conn_id</strong> (<em>str</em>) &#x2013; Reference to the wasb connection.</td>
</tr>
</tbody>
</table>

<pre>
check_for_directory(share_name, directory_name, **kwargs)</pre>
<p>Check if a directory exists on Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.exists()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">True if the file exists, False otherwise.</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype bool</p>




<pre>
check_for_file(share_name, directory_name, file_name, **kwargs)</pre>
<p>Check if a file exists on Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.exists()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">True if the file exists, False otherwise.</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype bool</p>




<pre>
create_directory(share_name, directory_name, **kwargs)</pre>
<p>Create a new direcotry on a Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.create_directory()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">A list of files and directories</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype list</p>




<pre>
get_conn()</pre>
<p>Return the FileService object.</p>




<pre>
get_file(file_path, share_name, directory_name, file_name, **kwargs)</pre>
<p>Download a file from Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>file_path</strong> (<em>str</em>) &#x2013; Where to store the file.</li>
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.get_file_to_path()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_file_to_stream(stream, share_name, directory_name, file_name, **kwargs)</pre>
<p>Download a file from Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>stream</strong> (<em>file-like object</em>) &#x2013; A filehandle to store the file to.</li>
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.get_file_to_stream()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
list_directories_and_files(share_name, directory_name=None, **kwargs)</pre>
<p>Return the list of directories and files stored on a Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.list_directories_and_files()</cite> takes.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">A list of files and directories</p>
</td>
</tr>
</tbody>
</table>
<p>:rtype list</p>




<pre>
load_file(file_path, share_name, directory_name, file_name, **kwargs)</pre>
<p>Upload a file to Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>file_path</strong> (<em>str</em>) &#x2013; Path to the file to load.</li>
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.create_file_from_path()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_stream(stream, share_name, directory_name, file_name, count, **kwargs)</pre>
<p>Upload a stream to Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>stream</strong> (<em>file-like</em>) &#x2013; Opened file/stream to upload as the file content.</li>
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>count</strong> (<em>int</em>) &#x2013; Size of the stream in bytes</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.create_file_from_stream()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_string(string_data, share_name, directory_name, file_name, **kwargs)</pre>
<p>Upload a string to Azure File Share.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>string_data</strong> (<em>str</em>) &#x2013; String to load.</li>
<li><strong>share_name</strong> (<em>str</em>) &#x2013; Name of the share.</li>
<li><strong>directory_name</strong> (<em>str</em>) &#x2013; Name of the directory.</li>
<li><strong>file_name</strong> (<em>str</em>) &#x2013; Name of the file.</li>
<li><strong>kwargs</strong> (<em>object</em>) &#x2013; Optional keyword arguments that
<cite>FileService.create_file_from_text()</cite> takes.</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
</div>
<div class="section" id="logging">
<h3 class="sigil_not_in_toc">Logging</h3>
<p>Airflow can be configured to read and write task logs in Azure Blob Storage.
See <a class="reference internal" href="howto/write-logs.html#write-logs-azure"><span class="std std-ref">Writing Logs to Azure Blob Storage</span></a>.</p>
</div>
<div class="section" id="azure-data-lake">
<h3 class="sigil_not_in_toc">Azure Data Lake</h3>
<p>AzureDataLakeHook communicates via a REST API compatible with WebHDFS. Make sure that a
Airflow connection of type <cite>azure_data_lake</cite> exists. Authorization can be done by supplying a
login (=Client ID), password (=Client Secret) and extra fields tenant (Tenant) and account_name (Account Name)</p>
<blockquote>
<div>(see connection <cite>azure_data_lake_default</cite> for an example).</div>
</blockquote>
<ul class="simple">
<li><a class="reference internal" href="#azuredatalakehook"><span class="std std-ref">AzureDataLakeHook</span></a>: Interface with Azure Data Lake.</li>
</ul>
<div class="section" id="azuredatalakehook">
<span id="id5"></span><h4 class="sigil_not_in_toc">AzureDataLakeHook</h4>

<pre>
class airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook(azure_data_lake_conn_id=&apos;azure_data_lake_default&apos;)</pre>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.hooks.base_hook.BaseHook</span></code></p>
<p>Interacts with Azure Data Lake.</p>
<p>Client ID and client secret should be in user and password parameters.
Tenant and account name should be extra field as
{&#x201C;tenant&#x201D;: &#x201C;&lt;TENANT&gt;&#x201D;, &#x201C;account_name&#x201D;: &#x201C;ACCOUNT_NAME&#x201D;}.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>azure_data_lake_conn_id</strong> (<em>str</em>) &#x2013; Reference to the Azure Data Lake connection.</td>
</tr>
</tbody>
</table>

<pre>
check_for_file(file_path)</pre>
<p>Check if a file exists on Azure Data Lake.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>file_path</strong> (<em>str</em>) &#x2013; Path and name of the file.</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body">True if the file exists, False otherwise.</td>
</tr>
</tbody>
</table>
<p>:rtype bool</p>




<pre>
download_file(local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)</pre>
<p>Download a file from Azure Blob Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>local_path</strong> (<em>str</em>) &#x2013; local path. If downloading a single file, will write to this
specific file, unless it is an existing directory, in which case a file is
created within it. If downloading multiple files, this is the root
directory to write within. Will create directories as required.</li>
<li><strong>remote_path</strong> (<em>str</em>) &#x2013; remote path/globstring to use to find remote files.
Recursive glob patterns using <cite>**</cite> are not supported.</li>
<li><strong>nthreads</strong> (<em>int</em>) &#x2013; Number of threads to use. If None, uses the number of cores.</li>
<li><strong>overwrite</strong> (<em>bool</em>) &#x2013; Whether to forcibly overwrite existing files/directories.
If False and remote path is a directory, will quit regardless if any files
would be overwritten or not. If True, only matching filenames are actually
overwritten.</li>
<li><strong>buffersize</strong> (<em>int</em>) &#x2013; int [2**22]
Number of bytes for internal buffer. This block cannot be bigger than
a chunk and cannot be smaller than a block.</li>
<li><strong>blocksize</strong> (<em>int</em>) &#x2013; int [2**22]
Number of bytes for a block. Within each chunk, we write a smaller
block for each API call. This block cannot be bigger than a chunk.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_conn()</pre>
<p>Return a AzureDLFileSystem object.</p>




<pre>
upload_file(local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)</pre>
<p>Upload a file to Azure Data Lake.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>local_path</strong> (<em>str</em>) &#x2013; local path. Can be single file, directory (in which case,
upload recursively) or glob pattern. Recursive glob patterns using <cite>**</cite>
are not supported.</li>
<li><strong>remote_path</strong> (<em>str</em>) &#x2013; Remote path to upload to; if multiple files, this is the
dircetory root to write within.</li>
<li><strong>nthreads</strong> (<em>int</em>) &#x2013; Number of threads to use. If None, uses the number of cores.</li>
<li><strong>overwrite</strong> (<em>bool</em>) &#x2013; Whether to forcibly overwrite existing files/directories.
If False and remote path is a directory, will quit regardless if any files
would be overwritten or not. If True, only matching filenames are actually
overwritten.</li>
<li><strong>buffersize</strong> (<em>int</em>) &#x2013; int [2**22]
Number of bytes for internal buffer. This block cannot be bigger than
a chunk and cannot be smaller than a block.</li>
<li><strong>blocksize</strong> (<em>int</em>) &#x2013; int [2**22]
Number of bytes for a block. Within each chunk, we write a smaller
block for each API call. This block cannot be bigger than a chunk.</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
</div>
</div>
<div class="section" id="aws-amazon-web-services">
<span id="aws"></span><h2 class="sigil_not_in_toc">AWS: Amazon Web Services</h2>
<p>Airflow has extensive support for Amazon Web Services. But note that the Hooks, Sensors and
Operators are in the contrib section.</p>
<div class="section" id="aws-emr">
<h3 class="sigil_not_in_toc">AWS EMR</h3>
<ul class="simple">
<li><a class="reference internal" href="#emraddstepsoperator"><span class="std std-ref">EmrAddStepsOperator</span></a> : Adds steps to an existing EMR JobFlow.</li>
<li><a class="reference internal" href="#emrcreatejobflowoperator"><span class="std std-ref">EmrCreateJobFlowOperator</span></a> : Creates an EMR JobFlow, reading the config from the EMR connection.</li>
<li><a class="reference internal" href="#emrterminatejobflowoperator"><span class="std std-ref">EmrTerminateJobFlowOperator</span></a> : Terminates an EMR JobFlow.</li>
<li><a class="reference internal" href="#emrhook"><span class="std std-ref">EmrHook</span></a> : Interact with AWS EMR.</li>
</ul>
<div class="section" id="emraddstepsoperator">
<span id="id6"></span><h4 class="sigil_not_in_toc">EmrAddStepsOperator</h4>

<pre>
class airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator(job_flow_id, aws_conn_id=&apos;s3_default&apos;, steps=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>An operator that adds steps to an existing EMR job_flow.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>job_flow_id</strong> &#x2013; id of the JobFlow to add steps to. (templated)</li>
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; aws connection to uses</li>
<li><strong>steps</strong> (<em>list</em>) &#x2013; boto3 style steps to be added to the jobflow. (templated)</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="emrcreatejobflowoperator">
<span id="id7"></span><h4 class="sigil_not_in_toc">EmrCreateJobFlowOperator</h4>

<pre>
class airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator(aws_conn_id=&apos;s3_default&apos;, emr_conn_id=&apos;emr_default&apos;, job_flow_overrides=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Creates an EMR JobFlow, reading the config from the EMR connection.
A dictionary of JobFlow overrides can be passed that override
the config from the connection.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; aws connection to uses</li>
<li><strong>emr_conn_id</strong> (<em>str</em>) &#x2013; emr connection to use</li>
<li><strong>job_flow_overrides</strong> &#x2013; boto3 style arguments to override
emr_connection extra. (templated)</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="emrterminatejobflowoperator">
<span id="id8"></span><h4 class="sigil_not_in_toc">EmrTerminateJobFlowOperator</h4>

<pre>
class airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator(job_flow_id, aws_conn_id=&apos;s3_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Operator to terminate EMR JobFlows.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>job_flow_id</strong> &#x2013; id of the JobFlow to terminate. (templated)</li>
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; aws connection to uses</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="emrhook">
<span id="id9"></span><h4 class="sigil_not_in_toc">EmrHook</h4>

<pre>
class airflow.contrib.hooks.emr_hook.EmrHook(emr_conn_id=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.aws_hook.AwsHook" title="airflow.contrib.hooks.aws_hook.AwsHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.aws_hook.AwsHook</span></code></a></p>
<p>Interact with AWS EMR. emr_conn_id is only neccessary for using the
create_job_flow method.</p>

<pre>
create_job_flow(job_flow_overrides)</pre>
<p>Creates a job flow using the config from the EMR connection.
Keys of the json extra hash may have the arguments of the boto3
run_job_flow method.
Overrides for this config may be passed as the job_flow_overrides.</p>






</div>
</div>
<div class="section" id="aws-s3">
<h3 class="sigil_not_in_toc">AWS S3</h3>
<ul class="simple">
<li><a class="reference internal" href="#s3hook"><span class="std std-ref">S3Hook</span></a> : Interact with AWS S3.</li>
<li><a class="reference internal" href="#s3filetransformoperator"><span class="std std-ref">S3FileTransformOperator</span></a> : Copies data from a source S3 location to a temporary location on the local filesystem.</li>
<li><a class="reference internal" href="#s3listoperator"><span class="std std-ref">S3ListOperator</span></a> : Lists the files matching a key prefix from a S3 location.</li>
<li><a class="reference internal" href="#s3togooglecloudstorageoperator"><span class="std std-ref">S3ToGoogleCloudStorageOperator</span></a> : Syncs an S3 location with a Google Cloud Storage bucket.</li>
<li><a class="reference internal" href="#s3tohivetransfer"><span class="std std-ref">S3ToHiveTransfer</span></a> : Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table.</li>
</ul>
<div class="section" id="s3hook">
<span id="id10"></span><h4 class="sigil_not_in_toc">S3Hook</h4>

<pre>
class airflow.hooks.S3_hook.S3Hook(aws_conn_id=&apos;aws_default&apos;)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.aws_hook.AwsHook" title="airflow.contrib.hooks.aws_hook.AwsHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.aws_hook.AwsHook</span></code></a></p>
<p>Interact with AWS S3, using the boto3 library.</p>

<pre>
check_for_bucket(bucket_name)</pre>
<p>Check if bucket_name exists.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</td>
</tr>
</tbody>
</table>




<pre>
check_for_key(key, bucket_name=None)</pre>
<p>Checks if a key exists in a bucket</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which the file is stored</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
check_for_prefix(bucket_name, prefix, delimiter)</pre>
<p>Checks that a prefix exists in a bucket</p>




<pre>
check_for_wildcard_key(wildcard_key, bucket_name=None, delimiter=&apos;&apos;)</pre>
<p>Checks that a key matching a wildcard expression exists in a bucket</p>




<pre>
get_bucket(bucket_name)</pre>
<p>Returns a boto3.S3.Bucket object</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</td>
</tr>
</tbody>
</table>




<pre>
get_key(key, bucket_name=None)</pre>
<p>Returns a boto3.s3.Object</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>key</strong> (<em>str</em>) &#x2013; the path to the key</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_wildcard_key(wildcard_key, bucket_name=None, delimiter=&apos;&apos;)</pre>
<p>Returns a boto3.s3.Object object matching the wildcard expression</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>wildcard_key</strong> (<em>str</em>) &#x2013; the path to the key</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
list_keys(bucket_name, prefix=&apos;&apos;, delimiter=&apos;&apos;, page_size=None, max_items=None)</pre>
<p>Lists keys in a bucket under prefix and not containing delimiter</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</li>
<li><strong>prefix</strong> (<em>str</em>) &#x2013; a key prefix</li>
<li><strong>delimiter</strong> (<em>str</em>) &#x2013; the delimiter marks key hierarchy.</li>
<li><strong>page_size</strong> (<em>int</em>) &#x2013; pagination size</li>
<li><strong>max_items</strong> (<em>int</em>) &#x2013; maximum items to return</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
list_prefixes(bucket_name, prefix=&apos;&apos;, delimiter=&apos;&apos;, page_size=None, max_items=None)</pre>
<p>Lists prefixes in a bucket under prefix</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; the name of the bucket</li>
<li><strong>prefix</strong> (<em>str</em>) &#x2013; a key prefix</li>
<li><strong>delimiter</strong> (<em>str</em>) &#x2013; the delimiter marks key hierarchy.</li>
<li><strong>page_size</strong> (<em>int</em>) &#x2013; pagination size</li>
<li><strong>max_items</strong> (<em>int</em>) &#x2013; maximum items to return</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_bytes(bytes_data, key, bucket_name=None, replace=False, encrypt=False)</pre>
<p>Loads bytes to S3</p>
<p>This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bytes_data</strong> (<em>bytes</em>) &#x2013; bytes to set as content for the key.</li>
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which to store the file</li>
<li><strong>replace</strong> (<em>bool</em>) &#x2013; A flag to decide whether or not to overwrite the key
if it already exists</li>
<li><strong>encrypt</strong> (<em>bool</em>) &#x2013; If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_file(filename, key, bucket_name=None, replace=False, encrypt=False)</pre>
<p>Loads a local file to S3</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>filename</strong> (<em>str</em>) &#x2013; name of the file to load.</li>
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which to store the file</li>
<li><strong>replace</strong> (<em>bool</em>) &#x2013; A flag to decide whether or not to overwrite the key
if it already exists. If replace is False and the key exists, an
error will be raised.</li>
<li><strong>encrypt</strong> (<em>bool</em>) &#x2013; If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
load_string(string_data, key, bucket_name=None, replace=False, encrypt=False, encoding=&apos;utf-8&apos;)</pre>
<p>Loads a string to S3</p>
<p>This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>string_data</strong> (<em>str</em>) &#x2013; string to set as content for the key.</li>
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which to store the file</li>
<li><strong>replace</strong> (<em>bool</em>) &#x2013; A flag to decide whether or not to overwrite the key
if it already exists</li>
<li><strong>encrypt</strong> (<em>bool</em>) &#x2013; If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
read_key(key, bucket_name=None)</pre>
<p>Reads a key from S3</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which the file is stored</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
select_key(key, bucket_name=None, expression=&apos;SELECT * FROM S3Object&apos;, expression_type=&apos;SQL&apos;, input_serialization={&apos;CSV&apos;: {}}, output_serialization={&apos;CSV&apos;: {}})</pre>
<p>Reads a key with S3 Select.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>key</strong> (<em>str</em>) &#x2013; S3 key that will point to the file</li>
<li><strong>bucket_name</strong> (<em>str</em>) &#x2013; Name of the bucket in which the file is stored</li>
<li><strong>expression</strong> (<em>str</em>) &#x2013; S3 Select expression</li>
<li><strong>expression_type</strong> (<em>str</em>) &#x2013; S3 Select expression type</li>
<li><strong>input_serialization</strong> (<em>dict</em>) &#x2013; S3 Select input data serialization format</li>
<li><strong>output_serialization</strong> (<em>dict</em>) &#x2013; S3 Select output data serialization format</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first">retrieved subset of original data by S3 Select</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th>
<td class="field-body"><p class="first last">str</p>
</td>
</tr>
</tbody>
</table>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more details about S3 Select parameters:
<a class="reference external" href="http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.select_object_content">http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.select_object_content</a></p>
</div>






</div>
<div class="section" id="s3filetransformoperator">
<span id="id11"></span><h4 class="sigil_not_in_toc">S3FileTransformOperator</h4>

<pre>
class airflow.operators.s3_file_transform_operator.S3FileTransformOperator(source_s3_key, dest_s3_key, transform_script=None, select_expression=None, source_aws_conn_id=&apos;aws_default&apos;, dest_aws_conn_id=&apos;aws_default&apos;, replace=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Copies data from a source S3 location to a temporary location on the
local filesystem. Runs a transformation on this file as specified by
the transformation script and uploads the output to a destination S3
location.</p>
<p>The locations of the source and the destination files in the local
filesystem is provided as an first and second arguments to the
transformation script. The transformation script is expected to read the
data from source, transform it and write the output to the local
destination file. The operator then takes over control and uploads the
local destination file to S3.</p>
<p>S3 Select is also available to filter the source contents. Users can
omit the transformation script if S3 Select expression is specified.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_s3_key</strong> (<em>str</em>) &#x2013; The key to be retrieved from S3. (templated)</li>
<li><strong>source_aws_conn_id</strong> (<em>str</em>) &#x2013; source s3 connection</li>
<li><strong>dest_s3_key</strong> (<em>str</em>) &#x2013; The key to be written from S3. (templated)</li>
<li><strong>dest_aws_conn_id</strong> (<em>str</em>) &#x2013; destination s3 connection</li>
<li><strong>replace</strong> (<em>bool</em>) &#x2013; Replace dest S3 key if it already exists</li>
<li><strong>transform_script</strong> (<em>str</em>) &#x2013; location of the executable transformation script</li>
<li><strong>select_expression</strong> (<em>str</em>) &#x2013; S3 Select expression</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="s3listoperator">
<span id="id12"></span><h4 class="sigil_not_in_toc">S3ListOperator</h4>

<pre>
class airflow.contrib.operators.s3_list_operator.S3ListOperator(bucket, prefix=&apos;&apos;, delimiter=&apos;&apos;, aws_conn_id=&apos;aws_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>List all objects from the bucket with the given string prefix in name.</p>
<p>This operator returns a python list with the name of objects which can be
used by <cite>xcom</cite> in the downstream task.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The S3 bucket where to find the objects. (templated)</li>
<li><strong>prefix</strong> (<em>string</em>) &#x2013; Prefix string to filters the objects whose name begin with
such prefix. (templated)</li>
<li><strong>delimiter</strong> (<em>string</em>) &#x2013; the delimiter marks key hierarchy. (templated)</li>
<li><strong>aws_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when connecting to S3 storage.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>Example:</pre>
<p class="first">The following operator would list all the files
(excluding subfolders) from the S3
<code class="docutils literal notranslate"><span class="pre">customers/2018/04/</span></code> key in the <code class="docutils literal notranslate"><span class="pre">data</span></code> bucket.</p>
<div class="last highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">s3_file</span> <span class="o">=</span> <span class="n">S3ListOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;list_3s_files&apos;</span><span class="p">,</span>
    <span class="n">bucket</span><span class="o">=</span><span class="s1">&apos;data&apos;</span><span class="p">,</span>
    <span class="n">prefix</span><span class="o">=</span><span class="s1">&apos;customers/2018/04/&apos;</span><span class="p">,</span>
    <span class="n">delimiter</span><span class="o">=</span><span class="s1">&apos;/&apos;</span><span class="p">,</span>
    <span class="n">aws_conn_id</span><span class="o">=</span><span class="s1">&apos;aws_customers_conn&apos;</span>
<span class="p">)</span>
</pre>
</div>
</div>





</div>
<div class="section" id="s3togooglecloudstorageoperator">
<span id="id13"></span><h4 class="sigil_not_in_toc">S3ToGoogleCloudStorageOperator</h4>

<pre>
class airflow.contrib.operators.s3_to_gcs_operator.S3ToGoogleCloudStorageOperator(bucket, prefix=&apos;&apos;, delimiter=&apos;&apos;, aws_conn_id=&apos;aws_default&apos;, dest_gcs_conn_id=None, dest_gcs=None, delegate_to=None, replace=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="#airflow.contrib.operators.s3_list_operator.S3ListOperator" title="airflow.contrib.operators.s3_list_operator.S3ListOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.operators.s3_list_operator.S3ListOperator</span></code></a></p>
<p>Synchronizes an S3 key, possibly a prefix, with a Google Cloud Storage
destination path.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The S3 bucket where to find the objects. (templated)</li>
<li><strong>prefix</strong> (<em>string</em>) &#x2013; Prefix string which filters objects whose name begin with
such prefix. (templated)</li>
<li><strong>delimiter</strong> (<em>string</em>) &#x2013; the delimiter marks key hierarchy. (templated)</li>
<li><strong>aws_conn_id</strong> (<em>string</em>) &#x2013; The source S3 connection</li>
<li><strong>dest_gcs_conn_id</strong> (<em>string</em>) &#x2013; The destination connection ID to use
when connecting to Google Cloud Storage.</li>
<li><strong>dest_gcs</strong> (<em>string</em>) &#x2013; The destination Google Cloud Storage bucket and prefix
where you want to store the files. (templated)</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
<li><strong>replace</strong> (<em>bool</em>) &#x2013; Whether you want to replace existing destination files
or not.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p><strong>Example</strong>:
.. code-block:: python</p>
<blockquote>
<div>
<pre>s3_to_gcs_op = S3ToGoogleCloudStorageOperator(</pre>
task_id=&#x2019;s3_to_gcs_example&#x2019;,
bucket=&#x2019;my-s3-bucket&#x2019;,
prefix=&#x2019;data/customers-201804&#x2019;,
dest_gcs_conn_id=&#x2019;google_cloud_default&#x2019;,
dest_gcs=&#x2019;gs://my.gcs.bucket/some/customers/&#x2019;,
replace=False,
dag=my-dag)

</div>
</blockquote>
<p>Note that <code class="docutils literal notranslate"><span class="pre">bucket</span></code>, <code class="docutils literal notranslate"><span class="pre">prefix</span></code>, <code class="docutils literal notranslate"><span class="pre">delimiter</span></code> and <code class="docutils literal notranslate"><span class="pre">dest_gcs</span></code> are
templated, so you can use variables in them if you wish.</p>



</div>
<div class="section" id="s3tohivetransfer">
<span id="id14"></span><h4 class="sigil_not_in_toc">S3ToHiveTransfer</h4>

<pre>
class airflow.operators.s3_to_hive_operator.S3ToHiveTransfer(s3_key, field_dict, hive_table, delimiter=&apos;, &apos;, create=True, recreate=False, partition=None, headers=False, check_headers=False, wildcard_match=False, aws_conn_id=&apos;aws_default&apos;, hive_cli_conn_id=&apos;hive_cli_default&apos;, input_compressed=False, tblproperties=None, select_expression=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Moves data from S3 to Hive. The operator downloads a file from S3,
stores the file locally before loading it into a Hive table.
If the <code class="docutils literal notranslate"><span class="pre">create</span></code> or <code class="docutils literal notranslate"><span class="pre">recreate</span></code> arguments are set to <code class="docutils literal notranslate"><span class="pre">True</span></code>,
a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">TABLE</span></code> and <code class="docutils literal notranslate"><span class="pre">DROP</span> <span class="pre">TABLE</span></code> statements are generated.
Hive data types are inferred from the cursor&#x2019;s metadata from.</p>
<p>Note that the table generated in Hive uses <code class="docutils literal notranslate"><span class="pre">STORED</span> <span class="pre">AS</span> <span class="pre">textfile</span></code>
which isn&#x2019;t the most efficient serialization format. If a
large amount of data is loaded and/or if the tables gets
queried considerably, you may want to use this operator only to
stage the data into a temporary table before loading it into its
final destination using a <code class="docutils literal notranslate"><span class="pre">HiveOperator</span></code>.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>s3_key</strong> (<em>str</em>) &#x2013; The key to be retrieved from S3. (templated)</li>
<li><strong>field_dict</strong> (<em>dict</em>) &#x2013; A dictionary of the fields name in the file
as keys and their Hive types as values</li>
<li><strong>hive_table</strong> (<em>str</em>) &#x2013; target Hive table, use dot notation to target a
specific database. (templated)</li>
<li><strong>create</strong> (<em>bool</em>) &#x2013; whether to create the table if it doesn&#x2019;t exist</li>
<li><strong>recreate</strong> (<em>bool</em>) &#x2013; whether to drop and recreate the table at every
execution</li>
<li><strong>partition</strong> (<em>dict</em>) &#x2013; target partition as a dict of partition columns
and values. (templated)</li>
<li><strong>headers</strong> (<em>bool</em>) &#x2013; whether the file contains column names on the first
line</li>
<li><strong>check_headers</strong> (<em>bool</em>) &#x2013; whether the column names on the first line should be
checked against the keys of field_dict</li>
<li><strong>wildcard_match</strong> (<em>bool</em>) &#x2013; whether the s3_key should be interpreted as a Unix
wildcard pattern</li>
<li><strong>delimiter</strong> (<em>str</em>) &#x2013; field delimiter in the file</li>
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; source s3 connection</li>
<li><strong>hive_cli_conn_id</strong> (<em>str</em>) &#x2013; destination hive connection</li>
<li><strong>input_compressed</strong> (<em>bool</em>) &#x2013; Boolean to determine if file decompression is
required to process headers</li>
<li><strong>tblproperties</strong> (<em>dict</em>) &#x2013; TBLPROPERTIES of the hive table being created</li>
<li><strong>select_expression</strong> (<em>str</em>) &#x2013; S3 Select expression</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="aws-ec2-container-service">
<h3 class="sigil_not_in_toc">AWS EC2 Container Service</h3>
<ul class="simple">
<li><a class="reference internal" href="#ecsoperator"><span class="std std-ref">ECSOperator</span></a> : Execute a task on AWS EC2 Container Service.</li>
</ul>
<div class="section" id="ecsoperator">
<span id="id15"></span><h4 class="sigil_not_in_toc">ECSOperator</h4>

<pre>
class airflow.contrib.operators.ecs_operator.ECSOperator(task_definition, cluster, overrides, aws_conn_id=None, region_name=None, launch_type=&apos;EC2&apos;, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Execute a task on AWS EC2 Container Service</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>task_definition</strong> (<em>str</em>) &#x2013; the task definition name on EC2 Container Service</li>
<li><strong>cluster</strong> (<em>str</em>) &#x2013; the cluster name on EC2 Container Service</li>
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; connection id of AWS credentials / region name. If None,
credential boto3 strategy will be used
(<a class="reference external" href="http://boto3.readthedocs.io/en/latest/guide/configuration.html">http://boto3.readthedocs.io/en/latest/guide/configuration.html</a>).</li>
<li><strong>region_name</strong> &#x2013; region name to use in AWS Hook.
Override the region_name in connection (if provided)</li>
<li><strong>launch_type</strong> &#x2013; the launch type on which to run your task (&#x2018;EC2&#x2019; or &#x2018;FARGATE&#x2019;)</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Param:</th>
<td class="field-body"><p class="first">overrides: the same parameter that boto3 will receive (templated):
<a class="reference external" href="http://boto3.readthedocs.org/en/latest/reference/services/ecs.html#ECS.Client.run_task">http://boto3.readthedocs.org/en/latest/reference/services/ecs.html#ECS.Client.run_task</a></p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Type:</th>
<td class="field-body"><p class="first">overrides: dict</p>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Type:</th>
<td class="field-body"><p class="first last">launch_type: str</p>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="aws-batch-service">
<h3 class="sigil_not_in_toc">AWS Batch Service</h3>
<ul class="simple">
<li><a class="reference internal" href="#awsbatchoperator"><span class="std std-ref">AWSBatchOperator</span></a> : Execute a task on AWS Batch Service.</li>
</ul>
<div class="section" id="awsbatchoperator">
<span id="id16"></span><h4 class="sigil_not_in_toc">AWSBatchOperator</h4>

<pre>
class airflow.contrib.operators.awsbatch_operator.AWSBatchOperator(job_name, job_definition, job_queue, overrides, max_retries=4200, aws_conn_id=None, region_name=None, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Execute a job on AWS Batch Service</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>job_name</strong> (<em>str</em>) &#x2013; the name for the job that will run on AWS Batch</li>
<li><strong>job_definition</strong> (<em>str</em>) &#x2013; the job definition name on AWS Batch</li>
<li><strong>job_queue</strong> (<em>str</em>) &#x2013; the queue name on AWS Batch</li>
<li><strong>max_retries</strong> (<em>int</em>) &#x2013; exponential backoff retries while waiter is not merged, 4200 = 48 hours</li>
<li><strong>aws_conn_id</strong> (<em>str</em>) &#x2013; connection id of AWS credentials / region name. If None,
credential boto3 strategy will be used
(<a class="reference external" href="http://boto3.readthedocs.io/en/latest/guide/configuration.html">http://boto3.readthedocs.io/en/latest/guide/configuration.html</a>).</li>
<li><strong>region_name</strong> &#x2013; region name to use in AWS Hook.
Override the region_name in connection (if provided)</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Param:</th>
<td class="field-body"><p class="first">overrides: the same parameter that boto3 will receive on
containerOverrides (templated):
<a class="reference external" href="http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job">http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job</a></p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Type:</th>
<td class="field-body"><p class="first last">overrides: dict</p>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="aws-redshift">
<h3 class="sigil_not_in_toc">AWS RedShift</h3>
<ul class="simple">
<li><a class="reference internal" href="#awsredshiftclustersensor"><span class="std std-ref">AwsRedshiftClusterSensor</span></a> : Waits for a Redshift cluster to reach a specific status.</li>
<li><a class="reference internal" href="#redshifthook"><span class="std std-ref">RedshiftHook</span></a> : Interact with AWS Redshift, using the boto3 library.</li>
<li><a class="reference internal" href="#redshifttos3transfer"><span class="std std-ref">RedshiftToS3Transfer</span></a> : Executes an unload command to S3 as CSV with or without headers.</li>
<li><a class="reference internal" href="#s3toredshifttransfer"><span class="std std-ref">S3ToRedshiftTransfer</span></a> : Executes an copy command from S3 as CSV with or without headers.</li>
</ul>
<div class="section" id="awsredshiftclustersensor">
<span id="id17"></span><h4 class="sigil_not_in_toc">AwsRedshiftClusterSensor</h4>

<pre>
class airflow.contrib.sensors.aws_redshift_cluster_sensor.AwsRedshiftClusterSensor(cluster_identifier, target_status=&apos;available&apos;, aws_conn_id=&apos;aws_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.sensors.base_sensor_operator.BaseSensorOperator" title="airflow.sensors.base_sensor_operator.BaseSensorOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.sensors.base_sensor_operator.BaseSensorOperator</span></code></a></p>
<p>Waits for a Redshift cluster to reach a specific status.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; The identifier for the cluster being pinged.</li>
<li><strong>target_status</strong> (<em>str</em>) &#x2013; The cluster status desired.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>
poke(context)</pre>
<p>Function that the sensors defined while deriving this class should
override.</p>






</div>
<div class="section" id="redshifthook">
<span id="id18"></span><h4 class="sigil_not_in_toc">RedshiftHook</h4>

<pre>
class airflow.contrib.hooks.redshift_hook.RedshiftHook(aws_conn_id=&apos;aws_default&apos;)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.aws_hook.AwsHook" title="airflow.contrib.hooks.aws_hook.AwsHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.aws_hook.AwsHook</span></code></a></p>
<p>Interact with AWS Redshift, using the boto3 library</p>

<pre>
cluster_status(cluster_identifier)</pre>
<p>Return status of a cluster</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; unique identifier of a cluster</td>
</tr>
</tbody>
</table>




<pre>
create_cluster_snapshot(snapshot_identifier, cluster_identifier)</pre>
<p>Creates a snapshot of a cluster</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>snapshot_identifier</strong> (<em>str</em>) &#x2013; unique identifier for a snapshot of a cluster</li>
<li><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; unique identifier of a cluster</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
delete_cluster(cluster_identifier, skip_final_cluster_snapshot=True, final_cluster_snapshot_identifier=&apos;&apos;)</pre>
<p>Delete a cluster and optionally create a snapshot</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; unique identifier of a cluster</li>
<li><strong>skip_final_cluster_snapshot</strong> (<em>bool</em>) &#x2013; determines cluster snapshot creation</li>
<li><strong>final_cluster_snapshot_identifier</strong> (<em>str</em>) &#x2013; name of final cluster snapshot</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
describe_cluster_snapshots(cluster_identifier)</pre>
<p>Gets a list of snapshots for a cluster</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; unique identifier of a cluster</td>
</tr>
</tbody>
</table>




<pre>
restore_from_cluster_snapshot(cluster_identifier, snapshot_identifier)</pre>
<p>Restores a cluster from its snapshot</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_identifier</strong> (<em>str</em>) &#x2013; unique identifier of a cluster</li>
<li><strong>snapshot_identifier</strong> (<em>str</em>) &#x2013; unique identifier for a snapshot of a cluster</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
<div class="section" id="redshifttos3transfer">
<span id="id19"></span><h4 class="sigil_not_in_toc">RedshiftToS3Transfer</h4>

<pre>
class airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer(schema, table, s3_bucket, s3_key, redshift_conn_id=&apos;redshift_default&apos;, aws_conn_id=&apos;aws_default&apos;, unload_options=(), autocommit=False, parameters=None, include_header=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Executes an UNLOAD command to s3 as a CSV with headers</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>schema</strong> (<em>string</em>) &#x2013; reference to a specific schema in redshift database</li>
<li><strong>table</strong> (<em>string</em>) &#x2013; reference to a specific table in redshift database</li>
<li><strong>s3_bucket</strong> (<em>string</em>) &#x2013; reference to a specific S3 bucket</li>
<li><strong>s3_key</strong> (<em>string</em>) &#x2013; reference to a specific S3 key</li>
<li><strong>redshift_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific redshift database</li>
<li><strong>aws_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific S3 connection</li>
<li><strong>unload_options</strong> (<em>list</em>) &#x2013; reference to a list of UNLOAD options</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="s3toredshifttransfer">
<span id="id20"></span><h4 class="sigil_not_in_toc">S3ToRedshiftTransfer</h4>

<pre>
class airflow.operators.s3_to_redshift_operator.S3ToRedshiftTransfer(schema, table, s3_bucket, s3_key, redshift_conn_id=&apos;redshift_default&apos;, aws_conn_id=&apos;aws_default&apos;, copy_options=(), autocommit=False, parameters=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Executes an COPY command to load files from s3 to Redshift</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>schema</strong> (<em>string</em>) &#x2013; reference to a specific schema in redshift database</li>
<li><strong>table</strong> (<em>string</em>) &#x2013; reference to a specific table in redshift database</li>
<li><strong>s3_bucket</strong> (<em>string</em>) &#x2013; reference to a specific S3 bucket</li>
<li><strong>s3_key</strong> (<em>string</em>) &#x2013; reference to a specific S3 key</li>
<li><strong>redshift_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific redshift database</li>
<li><strong>aws_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific S3 connection</li>
<li><strong>copy_options</strong> (<em>list</em>) &#x2013; reference to a list of COPY options</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
</div>
<div class="section" id="databricks">
<span id="id21"></span><h2 class="sigil_not_in_toc">Databricks</h2>
<p><a class="reference external" href="https://databricks.com/">Databricks</a> has contributed an Airflow operator which enables
submitting runs to the Databricks platform. Internally the operator talks to the
<code class="docutils literal notranslate"><span class="pre">api/2.0/jobs/runs/submit</span></code> <a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#runs-submit">endpoint</a>.</p>
<div class="section" id="databrickssubmitrunoperator">
<h3 class="sigil_not_in_toc">DatabricksSubmitRunOperator</h3>

<pre>
class airflow.contrib.operators.databricks_operator.DatabricksSubmitRunOperator(json=None, spark_jar_task=None, notebook_task=None, new_cluster=None, existing_cluster_id=None, libraries=None, run_name=None, timeout_seconds=None, databricks_conn_id=&apos;databricks_default&apos;, polling_period_seconds=30, databricks_retry_limit=3, do_xcom_push=False, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Submits an Spark job run to Databricks using the
<a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#runs-submit">api/2.0/jobs/runs/submit</a>
API endpoint.</p>
<p>There are two ways to instantiate this operator.</p>
<p>In the first way, you can take the JSON payload that you typically use
to call the <code class="docutils literal notranslate"><span class="pre">api/2.0/jobs/runs/submit</span></code> endpoint and pass it directly
to our <code class="docutils literal notranslate"><span class="pre">DatabricksSubmitRunOperator</span></code> through the <code class="docutils literal notranslate"><span class="pre">json</span></code> parameter.
For example</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">json</span> <span class="o">=</span> <span class="p">{</span>
  <span class="s1">&apos;new_cluster&apos;</span><span class="p">:</span> <span class="p">{</span>
    <span class="s1">&apos;spark_version&apos;</span><span class="p">:</span> <span class="s1">&apos;2.1.0-db3-scala2.11&apos;</span><span class="p">,</span>
    <span class="s1">&apos;num_workers&apos;</span><span class="p">:</span> <span class="mi">2</span>
  <span class="p">},</span>
  <span class="s1">&apos;notebook_task&apos;</span><span class="p">:</span> <span class="p">{</span>
    <span class="s1">&apos;notebook_path&apos;</span><span class="p">:</span> <span class="s1">&apos;/Users/airflow@example.com/PrepareData&apos;</span><span class="p">,</span>
  <span class="p">},</span>
<span class="p">}</span>
<span class="n">notebook_run</span> <span class="o">=</span> <span class="n">DatabricksSubmitRunOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;notebook_run&apos;</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="n">json</span><span class="p">)</span>
</pre>
</div>
</div>
<p>Another way to accomplish the same thing is to use the named parameters
of the <code class="docutils literal notranslate"><span class="pre">DatabricksSubmitRunOperator</span></code> directly. Note that there is exactly
one named parameter for each top level parameter in the <code class="docutils literal notranslate"><span class="pre">runs/submit</span></code>
endpoint. In this method, your code would look like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">new_cluster</span> <span class="o">=</span> <span class="p">{</span>
  <span class="s1">&apos;spark_version&apos;</span><span class="p">:</span> <span class="s1">&apos;2.1.0-db3-scala2.11&apos;</span><span class="p">,</span>
  <span class="s1">&apos;num_workers&apos;</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">}</span>
<span class="n">notebook_task</span> <span class="o">=</span> <span class="p">{</span>
  <span class="s1">&apos;notebook_path&apos;</span><span class="p">:</span> <span class="s1">&apos;/Users/airflow@example.com/PrepareData&apos;</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">notebook_run</span> <span class="o">=</span> <span class="n">DatabricksSubmitRunOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;notebook_run&apos;</span><span class="p">,</span>
    <span class="n">new_cluster</span><span class="o">=</span><span class="n">new_cluster</span><span class="p">,</span>
    <span class="n">notebook_task</span><span class="o">=</span><span class="n">notebook_task</span><span class="p">)</span>
</pre>
</div>
</div>
<p>In the case where both the json parameter <strong>AND</strong> the named parameters
are provided, they will be merged together. If there are conflicts during the merge,
the named parameters will take precedence and override the top level <code class="docutils literal notranslate"><span class="pre">json</span></code> keys.</p>

<pre>Currently the named parameters that DatabricksSubmitRunOperator supports are</pre>
<ul class="first last simple">
<li><code class="docutils literal notranslate"><span class="pre">spark_jar_task</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">notebook_task</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">new_cluster</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">existing_cluster_id</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">libraries</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">run_name</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">timeout_seconds</span></code></li>
</ul>


<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>json</strong> (<em>dict</em>) &#x2013; <p>A JSON object containing API parameters which will be passed
directly to the <code class="docutils literal notranslate"><span class="pre">api/2.0/jobs/runs/submit</span></code> endpoint. The other named parameters
(i.e. <code class="docutils literal notranslate"><span class="pre">spark_jar_task</span></code>, <code class="docutils literal notranslate"><span class="pre">notebook_task</span></code>..) to this operator will
be merged with this json dictionary if they are provided.
If there are conflicts during the merge, the named parameters will
take precedence and override the top level json keys. (templated)</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more information about templating see <a class="reference internal" href="concepts.html#jinja-templating"><span class="std std-ref">Jinja Templating</span></a>.
<a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#runs-submit">https://docs.databricks.com/api/latest/jobs.html#runs-submit</a></p>
</div>
</li>
<li><strong>spark_jar_task</strong> (<em>dict</em>) &#x2013; <p>The main class and parameters for the JAR task. Note that
the actual JAR is specified in the <code class="docutils literal notranslate"><span class="pre">libraries</span></code>.
<em>EITHER</em> <code class="docutils literal notranslate"><span class="pre">spark_jar_task</span></code> <em>OR</em> <code class="docutils literal notranslate"><span class="pre">notebook_task</span></code> should be specified.
This field will be templated.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#jobssparkjartask">https://docs.databricks.com/api/latest/jobs.html#jobssparkjartask</a></p>
</div>
</li>
<li><strong>notebook_task</strong> (<em>dict</em>) &#x2013; <p>The notebook path and parameters for the notebook task.
<em>EITHER</em> <code class="docutils literal notranslate"><span class="pre">spark_jar_task</span></code> <em>OR</em> <code class="docutils literal notranslate"><span class="pre">notebook_task</span></code> should be specified.
This field will be templated.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#jobsnotebooktask">https://docs.databricks.com/api/latest/jobs.html#jobsnotebooktask</a></p>
</div>
</li>
<li><strong>new_cluster</strong> (<em>dict</em>) &#x2013; <p>Specs for a new cluster on which this task will be run.
<em>EITHER</em> <code class="docutils literal notranslate"><span class="pre">new_cluster</span></code> <em>OR</em> <code class="docutils literal notranslate"><span class="pre">existing_cluster_id</span></code> should be specified.
This field will be templated.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://docs.databricks.com/api/latest/jobs.html#jobsclusterspecnewcluster">https://docs.databricks.com/api/latest/jobs.html#jobsclusterspecnewcluster</a></p>
</div>
</li>
<li><strong>existing_cluster_id</strong> (<em>string</em>) &#x2013; ID for existing cluster on which to run this task.
<em>EITHER</em> <code class="docutils literal notranslate"><span class="pre">new_cluster</span></code> <em>OR</em> <code class="docutils literal notranslate"><span class="pre">existing_cluster_id</span></code> should be specified.
This field will be templated.</li>
<li><strong>libraries</strong> (<em>list of dicts</em>) &#x2013; <p>Libraries which this run will use.
This field will be templated.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://docs.databricks.com/api/latest/libraries.html#managedlibrarieslibrary">https://docs.databricks.com/api/latest/libraries.html#managedlibrarieslibrary</a></p>
</div>
</li>
<li><strong>run_name</strong> (<em>string</em>) &#x2013; The run name used for this task.
By default this will be set to the Airflow <code class="docutils literal notranslate"><span class="pre">task_id</span></code>. This <code class="docutils literal notranslate"><span class="pre">task_id</span></code> is a
required parameter of the superclass <code class="docutils literal notranslate"><span class="pre">BaseOperator</span></code>.
This field will be templated.</li>
<li><strong>timeout_seconds</strong> (<em>int32</em>) &#x2013; The timeout for this run. By default a value of 0 is used
which means to have no timeout.
This field will be templated.</li>
<li><strong>databricks_conn_id</strong> (<em>string</em>) &#x2013; The name of the Airflow connection to use.
By default and in the common case this will be <code class="docutils literal notranslate"><span class="pre">databricks_default</span></code>. To use
token based authentication, provide the key <code class="docutils literal notranslate"><span class="pre">token</span></code> in the extra field for the
connection.</li>
<li><strong>polling_period_seconds</strong> (<em>int</em>) &#x2013; Controls the rate which we poll for the result of
this run. By default the operator will poll every 30 seconds.</li>
<li><strong>databricks_retry_limit</strong> (<em>int</em>) &#x2013; Amount of times retry if the Databricks backend is
unreachable. Its value must be greater than or equal to 1.</li>
<li><strong>do_xcom_push</strong> (<em>boolean</em>) &#x2013; Whether we should push run_id and run_page_url to xcom.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="gcp-google-cloud-platform">
<span id="gcp"></span><h2 class="sigil_not_in_toc">GCP: Google Cloud Platform</h2>
<p>Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and
Operators are in the contrib section. Meaning that they have a <em>beta</em> status, meaning that
they can have breaking changes between minor releases.</p>
<p>See the <a class="reference internal" href="howto/manage-connections.html#connection-type-gcp"><span class="std std-ref">GCP connection type</span></a> documentation to
configure connections to GCP.</p>
<div class="section" id="id22">
<h3 class="sigil_not_in_toc">Logging</h3>
<p>Airflow can be configured to read and write task logs in Google Cloud Storage.
See <a class="reference internal" href="howto/write-logs.html#write-logs-gcp"><span class="std std-ref">Writing Logs to Google Cloud Storage</span></a>.</p>
</div>
<div class="section" id="bigquery">
<h3 class="sigil_not_in_toc">BigQuery</h3>
<div class="section" id="bigquery-operators">
<h4 class="sigil_not_in_toc">BigQuery Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#bigquerycheckoperator"><span class="std std-ref">BigQueryCheckOperator</span></a> : Performs checks against a SQL query that will return a single row with different values.</li>
<li><a class="reference internal" href="#bigqueryvaluecheckoperator"><span class="std std-ref">BigQueryValueCheckOperator</span></a> : Performs a simple value check using SQL code.</li>
<li><a class="reference internal" href="#bigqueryintervalcheckoperator"><span class="std std-ref">BigQueryIntervalCheckOperator</span></a> : Checks that the values of metrics given as SQL expressions are within a certain tolerance of the ones from days_back before.</li>
<li><a class="reference internal" href="#bigquerycreateemptytableoperator"><span class="std std-ref">BigQueryCreateEmptyTableOperator</span></a> : Creates a new, empty table in the specified BigQuery dataset optionally with schema.</li>
<li><a class="reference internal" href="#bigquerycreateexternaltableoperator"><span class="std std-ref">BigQueryCreateExternalTableOperator</span></a> : Creates a new, external table in the dataset with the data in Google Cloud Storage.</li>
<li><a class="reference internal" href="#bigquerydeletedatasetoperator"><span class="std std-ref">BigQueryDeleteDatasetOperator</span></a> : Deletes an existing BigQuery dataset.</li>
<li><a class="reference internal" href="#bigqueryoperator"><span class="std std-ref">BigQueryOperator</span></a> : Executes BigQuery SQL queries in a specific BigQuery database.</li>
<li><a class="reference internal" href="#bigquerytobigqueryoperator"><span class="std std-ref">BigQueryToBigQueryOperator</span></a> : Copy a BigQuery table to another BigQuery table.</li>
<li><a class="reference internal" href="#bigquerytocloudstorageoperator"><span class="std std-ref">BigQueryToCloudStorageOperator</span></a> : Transfers a BigQuery table to a Google Cloud Storage bucket</li>
</ul>
<div class="section" id="bigquerycheckoperator">
<span id="id23"></span><h5 class="sigil_not_in_toc">BigQueryCheckOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_check_operator.BigQueryCheckOperator(sql, bigquery_conn_id=&apos;bigquery_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.operators.check_operator.CheckOperator" title="airflow.operators.check_operator.CheckOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.operators.check_operator.CheckOperator</span></code></a></p>
<p>Performs checks against BigQuery. The <code class="docutils literal notranslate"><span class="pre">BigQueryCheckOperator</span></code> expects
a sql query that will return a single row. Each value on that
first row is evaluated using python <code class="docutils literal notranslate"><span class="pre">bool</span></code> casting. If any of the
values return <code class="docutils literal notranslate"><span class="pre">False</span></code> the check is failed and errors out.</p>
<p>Note that Python bool casting evals the following as <code class="docutils literal notranslate"><span class="pre">False</span></code>:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">False</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">0</span></code></li>
<li>Empty string (<code class="docutils literal notranslate"><span class="pre">&quot;&quot;</span></code>)</li>
<li>Empty list (<code class="docutils literal notranslate"><span class="pre">[]</span></code>)</li>
<li>Empty dictionary or set (<code class="docutils literal notranslate"><span class="pre">{}</span></code>)</li>
</ul>
<p>Given a query like <code class="docutils literal notranslate"><span class="pre">SELECT</span> <span class="pre">COUNT(*)</span> <span class="pre">FROM</span> <span class="pre">foo</span></code>, it will fail only if
the count <code class="docutils literal notranslate"><span class="pre">==</span> <span class="pre">0</span></code>. You can craft much more complex query that could,
for instance, check that the table has the same number of rows as
the source table upstream, or that the count of today&#x2019;s partition is
greater than yesterday&#x2019;s partition, or that a set of metrics are less
than 3 standard deviation for the 7 day average.</p>
<p>This operator can be used as a data quality check in your pipeline, and
depending on where you put it in your DAG, you have the choice to
stop the critical path, preventing from
publishing dubious data, or on the side and receive email alterts
without stopping the progress of the DAG.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>sql</strong> (<em>string</em>) &#x2013; the sql to be executed</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to the BigQuery database</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigqueryvaluecheckoperator">
<span id="id24"></span><h5 class="sigil_not_in_toc">BigQueryValueCheckOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_check_operator.BigQueryValueCheckOperator(sql, pass_value, tolerance=None, bigquery_conn_id=&apos;bigquery_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.operators.check_operator.ValueCheckOperator" title="airflow.operators.check_operator.ValueCheckOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.operators.check_operator.ValueCheckOperator</span></code></a></p>
<p>Performs a simple value check using sql code.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>sql</strong> (<em>string</em>) &#x2013; the sql to be executed</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigqueryintervalcheckoperator">
<span id="id25"></span><h5 class="sigil_not_in_toc">BigQueryIntervalCheckOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_check_operator.BigQueryIntervalCheckOperator(table, metrics_thresholds, date_filter_column=&apos;ds&apos;, days_back=-7, bigquery_conn_id=&apos;bigquery_default&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.operators.check_operator.IntervalCheckOperator" title="airflow.operators.check_operator.IntervalCheckOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.operators.check_operator.IntervalCheckOperator</span></code></a></p>
<p>Checks that the values of metrics given as SQL expressions are within
a certain tolerance of the ones from days_back before.</p>
<p>This method constructs a query like so</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">SELECT</span> <span class="p">{</span><span class="n">metrics_threshold_dict_key</span><span class="p">}</span> <span class="n">FROM</span> <span class="p">{</span><span class="n">table</span><span class="p">}</span>
    <span class="n">WHERE</span> <span class="p">{</span><span class="n">date_filter_column</span><span class="p">}</span><span class="o">=&lt;</span><span class="n">date</span><span class="o">&gt;</span>
</pre>
</div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>table</strong> (<em>str</em>) &#x2013; the table name</li>
<li><strong>days_back</strong> (<em>int</em>) &#x2013; number of days between ds and the ds we want to check
against. Defaults to 7 days</li>
<li><strong>metrics_threshold</strong> (<em>dict</em>) &#x2013; a dictionary of ratios indexed by metrics, for
example &#x2018;COUNT(*)&#x2019;: 1.5 would require a 50 percent or less difference
between the current day, and the prior days_back.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerygetdataoperator">
<span id="id26"></span><h5 class="sigil_not_in_toc">BigQueryGetDataOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_get_data.BigQueryGetDataOperator(dataset_id, table_id, max_results=&apos;100&apos;, selected_fields=None, bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Fetches the data from a BigQuery table (alternatively fetch data for selected columns)
and returns data in a python list. The number of elements in the returned list will
be equal to the number of rows fetched. Each element in the list will again be a list
where element would represent the columns values for that row.</p>
<p><strong>Example Result</strong>: <code class="docutils literal notranslate"><span class="pre">[[&apos;Tony&apos;,</span> <span class="pre">&apos;10&apos;],</span> <span class="pre">[&apos;Mike&apos;,</span> <span class="pre">&apos;20&apos;],</span> <span class="pre">[&apos;Steve&apos;,</span> <span class="pre">&apos;15&apos;]]</span></code></p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If you pass fields to <code class="docutils literal notranslate"><span class="pre">selected_fields</span></code> which are in different order than the
order of columns already in
BQ table, the data will still be in the order of BQ table.
For example if the BQ table has 3 columns as
<code class="docutils literal notranslate"><span class="pre">[A,B,C]</span></code> and you pass &#x2018;B,A&#x2019; in the <code class="docutils literal notranslate"><span class="pre">selected_fields</span></code>
the data would still be of the form <code class="docutils literal notranslate"><span class="pre">&apos;A,B&apos;</span></code>.</p>
</div>
<p><strong>Example</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">get_data</span> <span class="o">=</span> <span class="n">BigQueryGetDataOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;get_data_from_bq&apos;</span><span class="p">,</span>
    <span class="n">dataset_id</span><span class="o">=</span><span class="s1">&apos;test_dataset&apos;</span><span class="p">,</span>
    <span class="n">table_id</span><span class="o">=</span><span class="s1">&apos;Transaction_partitions&apos;</span><span class="p">,</span>
    <span class="n">max_results</span><span class="o">=</span><span class="s1">&apos;100&apos;</span><span class="p">,</span>
    <span class="n">selected_fields</span><span class="o">=</span><span class="s1">&apos;DATE&apos;</span><span class="p">,</span>
    <span class="n">bigquery_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span>
<span class="p">)</span>
</pre>
</div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>dataset_id</strong> &#x2013; The dataset ID of the requested table. (templated)</li>
<li><strong>table_id</strong> (<em>string</em>) &#x2013; The table ID of the requested table. (templated)</li>
<li><strong>max_results</strong> (<em>string</em>) &#x2013; The maximum number of records (rows) to be fetched
from the table. (templated)</li>
<li><strong>selected_fields</strong> (<em>string</em>) &#x2013; List of fields to return (comma-separated). If
unspecified, all fields are returned.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific BigQuery hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerycreateemptytableoperator">
<span id="id27"></span><h5 class="sigil_not_in_toc">BigQueryCreateEmptyTableOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_operator.BigQueryCreateEmptyTableOperator(dataset_id, table_id, project_id=None, schema_fields=None, gcs_schema_object=None, time_partitioning={}, bigquery_conn_id=&apos;bigquery_default&apos;, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Creates a new, empty table in the specified BigQuery dataset,
optionally with schema.</p>
<p>The schema to be used for the BigQuery table may be specified in one of
two ways. You may either directly pass the schema fields in, or you may
point the operator to a Google cloud storage object name. The object in
Google cloud storage must be a JSON file with the schema fields in it.
You can also create a table without schema.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The project to create the table into. (templated)</li>
<li><strong>dataset_id</strong> (<em>string</em>) &#x2013; The dataset to create the table into. (templated)</li>
<li><strong>table_id</strong> (<em>string</em>) &#x2013; The Name of the table to be created. (templated)</li>
<li><strong>schema_fields</strong> (<em>list</em>) &#x2013; <p>If set, the schema field list as defined here:
<a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema">https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema</a></p>
<p><strong>Example</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">schema_fields</span><span class="o">=</span><span class="p">[{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;emp_name&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;STRING&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;REQUIRED&quot;</span><span class="p">},</span>
               <span class="p">{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;salary&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;INTEGER&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;NULLABLE&quot;</span><span class="p">}]</span>
</pre>
</div>
</div>
</li>
<li><strong>gcs_schema_object</strong> (<em>string</em>) &#x2013; Full path to the JSON file containing
schema (templated). For
example: <code class="docutils literal notranslate"><span class="pre">gs://test-bucket/dir1/dir2/employee_schema.json</span></code></li>
<li><strong>time_partitioning</strong> (<em>dict</em>) &#x2013; <p>configure optional time partitioning fields i.e.
partition by field, type and  expiration as per API specifications.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning">https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning</a></p>
</div>
</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific BigQuery hook.</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific Google
cloud storage hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any. For this to
work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p><strong>Example (with schema JSON in GCS)</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CreateTable</span> <span class="o">=</span> <span class="n">BigQueryCreateEmptyTableOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;BigQueryCreateEmptyTableOperator_task&apos;</span><span class="p">,</span>
    <span class="n">dataset_id</span><span class="o">=</span><span class="s1">&apos;ODS&apos;</span><span class="p">,</span>
    <span class="n">table_id</span><span class="o">=</span><span class="s1">&apos;Employees&apos;</span><span class="p">,</span>
    <span class="n">project_id</span><span class="o">=</span><span class="s1">&apos;internal-gcp-project&apos;</span><span class="p">,</span>
    <span class="n">gcs_schema_object</span><span class="o">=</span><span class="s1">&apos;gs://schema-bucket/employee_schema.json&apos;</span><span class="p">,</span>
    <span class="n">bigquery_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span>
<span class="p">)</span>
</pre>
</div>
</div>
<p><strong>Corresponding Schema file</strong> (<code class="docutils literal notranslate"><span class="pre">employee_schema.json</span></code>):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span>
  <span class="p">{</span>
    <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;NULLABLE&quot;</span><span class="p">,</span>
    <span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;emp_name&quot;</span><span class="p">,</span>
    <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;STRING&quot;</span>
  <span class="p">},</span>
  <span class="p">{</span>
    <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;REQUIRED&quot;</span><span class="p">,</span>
    <span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;salary&quot;</span><span class="p">,</span>
    <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;INTEGER&quot;</span>
  <span class="p">}</span>
<span class="p">]</span>
</pre>
</div>
</div>
<p><strong>Example (with schema in the DAG)</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CreateTable</span> <span class="o">=</span> <span class="n">BigQueryCreateEmptyTableOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;BigQueryCreateEmptyTableOperator_task&apos;</span><span class="p">,</span>
    <span class="n">dataset_id</span><span class="o">=</span><span class="s1">&apos;ODS&apos;</span><span class="p">,</span>
    <span class="n">table_id</span><span class="o">=</span><span class="s1">&apos;Employees&apos;</span><span class="p">,</span>
    <span class="n">project_id</span><span class="o">=</span><span class="s1">&apos;internal-gcp-project&apos;</span><span class="p">,</span>
    <span class="n">schema_fields</span><span class="o">=</span><span class="p">[{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;emp_name&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;STRING&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;REQUIRED&quot;</span><span class="p">},</span>
                   <span class="p">{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;salary&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;INTEGER&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;NULLABLE&quot;</span><span class="p">}],</span>
    <span class="n">bigquery_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span>
<span class="p">)</span>
</pre>
</div>
</div>



</div>
<div class="section" id="bigquerycreateexternaltableoperator">
<span id="id28"></span><h5 class="sigil_not_in_toc">BigQueryCreateExternalTableOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_operator.BigQueryCreateExternalTableOperator(bucket, source_objects, destination_project_dataset_table, schema_fields=None, schema_object=None, source_format=&apos;CSV&apos;, compression=&apos;NONE&apos;, skip_leading_rows=0, field_delimiter=&apos;, &apos;, max_bad_records=0, quote_character=None, allow_quoted_newlines=False, allow_jagged_rows=False, bigquery_conn_id=&apos;bigquery_default&apos;, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, src_fmt_configs={}, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Creates a new external table in the dataset with the data in Google Cloud
Storage.</p>
<p>The schema to be used for the BigQuery table may be specified in one of
two ways. You may either directly pass the schema fields in, or you may
point the operator to a Google cloud storage object name. The object in
Google cloud storage must be a JSON file with the schema fields in it.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The bucket to point the external table to. (templated)</li>
<li><strong>source_objects</strong> &#x2013; List of Google cloud storage URIs to point
table to. (templated)
If source_format is &#x2018;DATASTORE_BACKUP&#x2019;, the list must only contain a single URI.</li>
<li><strong>destination_project_dataset_table</strong> (<em>string</em>) &#x2013; The dotted (&lt;project&gt;.)&lt;dataset&gt;.&lt;table&gt;
BigQuery table to load data into (templated). If &lt;project&gt; is not included,
project will be the project defined in the connection json.</li>
<li><strong>schema_fields</strong> (<em>list</em>) &#x2013; <p>If set, the schema field list as defined here:
<a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema">https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema</a></p>
<p><strong>Example</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">schema_fields</span><span class="o">=</span><span class="p">[{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;emp_name&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;STRING&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;REQUIRED&quot;</span><span class="p">},</span>
               <span class="p">{</span><span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;salary&quot;</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;INTEGER&quot;</span><span class="p">,</span> <span class="s2">&quot;mode&quot;</span><span class="p">:</span> <span class="s2">&quot;NULLABLE&quot;</span><span class="p">}]</span>
</pre>
</div>
</div>
<p>Should not be set when source_format is &#x2018;DATASTORE_BACKUP&#x2019;.</p>
</li>
<li><strong>schema_object</strong> &#x2013; If set, a GCS object path pointing to a .json file that
contains the schema for the table. (templated)</li>
<li><strong>schema_object</strong> &#x2013; string</li>
<li><strong>source_format</strong> (<em>string</em>) &#x2013; File format of the data.</li>
<li><strong>compression</strong> (<em>string</em>) &#x2013; [Optional] The compression type of the data source.
Possible values include GZIP and NONE.
The default value is NONE.
This setting is ignored for Google Cloud Bigtable,
Google Cloud Datastore backups and Avro formats.</li>
<li><strong>skip_leading_rows</strong> (<em>int</em>) &#x2013; Number of rows to skip when loading from a CSV.</li>
<li><strong>field_delimiter</strong> (<em>string</em>) &#x2013; The delimiter to use for the CSV.</li>
<li><strong>max_bad_records</strong> (<em>int</em>) &#x2013; The maximum number of bad records that BigQuery can
ignore when running the job.</li>
<li><strong>quote_character</strong> (<em>string</em>) &#x2013; The value that is used to quote data sections in a CSV file.</li>
<li><strong>allow_quoted_newlines</strong> (<em>boolean</em>) &#x2013; Whether to allow quoted newlines (true) or not (false).</li>
<li><strong>allow_jagged_rows</strong> (<em>bool</em>) &#x2013; Accept rows that are missing trailing optional columns.
The missing values are treated as nulls. If false, records with missing trailing
columns are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. Only applicable to CSV, ignored
for other formats.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific BigQuery hook.</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific Google
cloud storage hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any. For this to
work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>src_fmt_configs</strong> (<em>dict</em>) &#x2013; configure optional fields specific to the source format</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerydeletedatasetoperator">
<span id="id29"></span><h5 class="sigil_not_in_toc">BigQueryDeleteDatasetOperator</h5>
</div>
<div class="section" id="bigqueryoperator">
<span id="id30"></span><h5 class="sigil_not_in_toc">BigQueryOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_operator.BigQueryOperator(bql=None, sql=None, destination_dataset_table=False, write_disposition=&apos;WRITE_EMPTY&apos;, allow_large_results=False, flatten_results=False, bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, udf_config=False, use_legacy_sql=True, maximum_billing_tier=None, maximum_bytes_billed=None, create_disposition=&apos;CREATE_IF_NEEDED&apos;, schema_update_options=(), query_params=None, priority=&apos;INTERACTIVE&apos;, time_partitioning={}, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Executes BigQuery SQL queries in a specific BigQuery database</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bql</strong> (<em>Can receive a str representing a sql statement</em><em>,
</em><em>a list of str</em><em> (</em><em>sql statements</em><em>)</em><em>, or </em><em>reference to a template file.
Template reference are recognized by str ending in &apos;.sql&apos;.</em>) &#x2013; (Deprecated. Use <cite>sql</cite> parameter instead) the sql code to be
executed (templated)</li>
<li><strong>sql</strong> (<em>Can receive a str representing a sql statement</em><em>,
</em><em>a list of str</em><em> (</em><em>sql statements</em><em>)</em><em>, or </em><em>reference to a template file.
Template reference are recognized by str ending in &apos;.sql&apos;.</em>) &#x2013; the sql code to be executed (templated)</li>
<li><strong>destination_dataset_table</strong> (<em>string</em>) &#x2013; A dotted
(&lt;project&gt;.|&lt;project&gt;:)&lt;dataset&gt;.&lt;table&gt; that, if set, will store the results
of the query. (templated)</li>
<li><strong>write_disposition</strong> (<em>string</em>) &#x2013; Specifies the action that occurs if the destination table
already exists. (default: &#x2018;WRITE_EMPTY&#x2019;)</li>
<li><strong>create_disposition</strong> (<em>string</em>) &#x2013; Specifies whether the job is allowed to create new tables.
(default: &#x2018;CREATE_IF_NEEDED&#x2019;)</li>
<li><strong>allow_large_results</strong> (<em>boolean</em>) &#x2013; Whether to allow large results.</li>
<li><strong>flatten_results</strong> (<em>boolean</em>) &#x2013; If true and query uses legacy SQL dialect, flattens
all nested and repeated fields in the query results. <code class="docutils literal notranslate"><span class="pre">allow_large_results</span></code>
must be <code class="docutils literal notranslate"><span class="pre">true</span></code> if this is set to <code class="docutils literal notranslate"><span class="pre">false</span></code>. For standard SQL queries, this
flag is ignored and results are never flattened.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific BigQuery hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>udf_config</strong> (<em>list</em>) &#x2013; The User Defined Function configuration for the query.
See <a class="reference external" href="https://cloud.google.com/bigquery/user-defined-functions">https://cloud.google.com/bigquery/user-defined-functions</a> for details.</li>
<li><strong>use_legacy_sql</strong> (<em>boolean</em>) &#x2013; Whether to use legacy SQL (true) or standard SQL (false).</li>
<li><strong>maximum_billing_tier</strong> (<em>integer</em>) &#x2013; Positive integer that serves as a multiplier
of the basic price.
Defaults to None, in which case it uses the value set in the project.</li>
<li><strong>maximum_bytes_billed</strong> (<em>float</em>) &#x2013; Limits the bytes billed for this job.
Queries that will have bytes billed beyond this limit will fail
(without incurring a charge). If unspecified, this will be
set to your project default.</li>
<li><strong>schema_update_options</strong> (<em>tuple</em>) &#x2013; Allows the schema of the destination
table to be updated as a side effect of the load job.</li>
<li><strong>query_params</strong> (<em>dict</em>) &#x2013; a dictionary containing query parameter types and
values, passed to BigQuery.</li>
<li><strong>priority</strong> (<em>string</em>) &#x2013; Specifies a priority for the query.
Possible values include INTERACTIVE and BATCH.
The default value is INTERACTIVE.</li>
<li><strong>time_partitioning</strong> (<em>dict</em>) &#x2013; configure optional time partitioning fields i.e.
partition by field, type and
expiration as per API specifications. Note that &#x2018;field&#x2019; is not available in
conjunction with dataset.table$partition.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerytabledeleteoperator">
<span id="id31"></span><h5 class="sigil_not_in_toc">BigQueryTableDeleteOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_table_delete_operator.BigQueryTableDeleteOperator(deletion_dataset_table, bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, ignore_if_missing=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Deletes BigQuery tables</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>deletion_dataset_table</strong> (<em>string</em>) &#x2013; A dotted
(&lt;project&gt;.|&lt;project&gt;:)&lt;dataset&gt;.&lt;table&gt; that indicates which table
will be deleted. (templated)</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific BigQuery hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>ignore_if_missing</strong> (<em>boolean</em>) &#x2013; if True, then return success even if the
requested table does not exist.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerytobigqueryoperator">
<span id="id32"></span><h5 class="sigil_not_in_toc">BigQueryToBigQueryOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_to_bigquery.BigQueryToBigQueryOperator(source_project_dataset_tables, destination_project_dataset_table, write_disposition=&apos;WRITE_EMPTY&apos;, create_disposition=&apos;CREATE_IF_NEEDED&apos;, bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Copies data from one BigQuery table to another.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more details about these parameters:
<a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.copy">https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.copy</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_project_dataset_tables</strong> (<em>list|string</em>) &#x2013; One or more
dotted (project:<a href="#id33"><span class="problematic" id="id34">|</span></a>project.)&lt;dataset&gt;.&lt;table&gt; BigQuery tables to use as the
source data. If &lt;project&gt; is not included, project will be the
project defined in the connection json. Use a list if there are multiple
source tables. (templated)</li>
<li><strong>destination_project_dataset_table</strong> (<em>string</em>) &#x2013; The destination BigQuery
table. Format is: (project:<a href="#id35"><span class="problematic" id="id36">|</span></a>project.)&lt;dataset&gt;.&lt;table&gt; (templated)</li>
<li><strong>write_disposition</strong> (<em>string</em>) &#x2013; The write disposition if the table already exists.</li>
<li><strong>create_disposition</strong> (<em>string</em>) &#x2013; The create disposition if the table doesn&#x2019;t exist.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific BigQuery hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="bigquerytocloudstorageoperator">
<span id="id37"></span><h5 class="sigil_not_in_toc">BigQueryToCloudStorageOperator</h5>

<pre>
class airflow.contrib.operators.bigquery_to_gcs.BigQueryToCloudStorageOperator(source_project_dataset_table, destination_cloud_storage_uris, compression=&apos;NONE&apos;, export_format=&apos;CSV&apos;, field_delimiter=&apos;, &apos;, print_header=True, bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Transfers a BigQuery table to a Google Cloud Storage bucket.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more details about these parameters:
<a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/v2/jobs">https://cloud.google.com/bigquery/docs/reference/v2/jobs</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_project_dataset_table</strong> (<em>string</em>) &#x2013; The dotted
(&lt;project&gt;.|&lt;project&gt;:)&lt;dataset&gt;.&lt;table&gt; BigQuery table to use as the source
data. If &lt;project&gt; is not included, project will be the project
defined in the connection json. (templated)</li>
<li><strong>destination_cloud_storage_uris</strong> (<em>list</em>) &#x2013; The destination Google Cloud
Storage URI (e.g. gs://some-bucket/some-file.txt). (templated) Follows
convention defined here:
https://cloud.google.com/bigquery/exporting-data-from-bigquery#exportingmultiple</li>
<li><strong>compression</strong> (<em>string</em>) &#x2013; Type of compression to use.</li>
<li><strong>export_format</strong> &#x2013; File format to export.</li>
<li><strong>field_delimiter</strong> (<em>string</em>) &#x2013; The delimiter to use when extracting to a CSV.</li>
<li><strong>print_header</strong> (<em>boolean</em>) &#x2013; Whether to print a header for a CSV file extract.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; reference to a specific BigQuery hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="bigqueryhook">
<h4 class="sigil_not_in_toc">BigQueryHook</h4>

<pre>
class airflow.contrib.hooks.bigquery_hook.BigQueryHook(bigquery_conn_id=&apos;bigquery_default&apos;, delegate_to=None, use_legacy_sql=True)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook</span></code></a>, <a class="reference internal" href="code.html#airflow.hooks.dbapi_hook.DbApiHook" title="airflow.hooks.dbapi_hook.DbApiHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.hooks.dbapi_hook.DbApiHook</span></code></a>, <code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.utils.log.logging_mixin.LoggingMixin</span></code></p>
<p>Interact with BigQuery. This hook uses the Google Cloud Platform
connection.</p>

<pre>
get_conn()</pre>
<p>Returns a BigQuery PEP 249 connection object.</p>




<pre>
get_pandas_df(sql, parameters=None, dialect=None)</pre>
<p>Returns a Pandas DataFrame for the results produced by a BigQuery
query. The DbApiHook method must be overridden because Pandas
doesn&#x2019;t support PEP 249 connections, except for SQLite. See:</p>
<p><a class="reference external" href="https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L447">https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L447</a>
<a class="reference external" href="https://github.com/pydata/pandas/issues/6900">https://github.com/pydata/pandas/issues/6900</a></p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>sql</strong> (<em>string</em>) &#x2013; The BigQuery SQL to execute.</li>
<li><strong>parameters</strong> (<em>mapping</em><em> or </em><em>iterable</em>) &#x2013; The parameters to render the SQL query with (not
used, leave to override superclass method)</li>
<li><strong>dialect</strong> (<em>string in {&apos;legacy&apos;</em><em>, </em><em>&apos;standard&apos;}</em>) &#x2013; Dialect of BigQuery SQL &#x2013; legacy SQL or standard SQL
defaults to use <cite>self.use_legacy_sql</cite> if not specified</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_service()</pre>
<p>Returns a BigQuery service object.</p>




<pre>
insert_rows(table, rows, target_fields=None, commit_every=1000)</pre>
<p>Insertion is currently unsupported. Theoretically, you could use
BigQuery&#x2019;s streaming API to insert rows into a table, but this hasn&#x2019;t
been implemented.</p>




<pre>
table_exists(project_id, dataset_id, table_id)</pre>
<p>Checks for the existence of a table in Google BigQuery.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google cloud project in which to look for the
table. The connection supplied to the hook must provide access to
the specified project.</li>
<li><strong>dataset_id</strong> (<em>string</em>) &#x2013; The name of the dataset in which to look for the
table.</li>
<li><strong>table_id</strong> (<em>string</em>) &#x2013; The name of the table to check the existence of.</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
</div>
<div class="section" id="cloud-dataflow">
<h3 class="sigil_not_in_toc">Cloud DataFlow</h3>
<div class="section" id="dataflow-operators">
<h4 class="sigil_not_in_toc">DataFlow Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#dataflowjavaoperator"><span class="std std-ref">DataFlowJavaOperator</span></a> : launching Cloud Dataflow jobs written in Java.</li>
<li><a class="reference internal" href="#dataflowtemplateoperator"><span class="std std-ref">DataflowTemplateOperator</span></a> : launching a templated Cloud DataFlow batch job.</li>
<li><a class="reference internal" href="#dataflowpythonoperator"><span class="std std-ref">DataFlowPythonOperator</span></a> : launching Cloud Dataflow jobs written in python.</li>
</ul>
<div class="section" id="dataflowjavaoperator">
<span id="id38"></span><h5 class="sigil_not_in_toc">DataFlowJavaOperator</h5>

<pre>
class airflow.contrib.operators.dataflow_operator.DataFlowJavaOperator(jar, dataflow_default_options=None, options=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, poll_sleep=10, job_class=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Java Cloud DataFlow batch job. The parameters of the operation
will be passed to the job.</p>
<p>It&#x2019;s a good practice to define dataflow_* parameters in the default_args of the dag
like the project, zone and staging location.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s1">&apos;dataflow_default_options&apos;</span><span class="p">:</span> <span class="p">{</span>
        <span class="s1">&apos;project&apos;</span><span class="p">:</span> <span class="s1">&apos;my-gcp-project&apos;</span><span class="p">,</span>
        <span class="s1">&apos;zone&apos;</span><span class="p">:</span> <span class="s1">&apos;europe-west1-d&apos;</span><span class="p">,</span>
        <span class="s1">&apos;stagingLocation&apos;</span><span class="p">:</span> <span class="s1">&apos;gs://my-staging-bucket/staging/&apos;</span>
    <span class="p">}</span>
<span class="p">}</span>
</pre>
</div>
</div>
<p>You need to pass the path to your dataflow as a file reference with the <code class="docutils literal notranslate"><span class="pre">jar</span></code>
parameter, the jar needs to be a self executing jar (see documentation here:
<a class="reference external" href="https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar">https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar</a>).
Use <code class="docutils literal notranslate"><span class="pre">options</span></code> to pass on options to your job.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">t1</span> <span class="o">=</span> <span class="n">DataFlowOperation</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;datapflow_example&apos;</span><span class="p">,</span>
    <span class="n">jar</span><span class="o">=</span><span class="s1">&apos;{{var.value.gcp_dataflow_base}}pipeline/build/libs/pipeline-example-1.0.jar&apos;</span><span class="p">,</span>
    <span class="n">options</span><span class="o">=</span><span class="p">{</span>
        <span class="s1">&apos;autoscalingAlgorithm&apos;</span><span class="p">:</span> <span class="s1">&apos;BASIC&apos;</span><span class="p">,</span>
        <span class="s1">&apos;maxNumWorkers&apos;</span><span class="p">:</span> <span class="s1">&apos;50&apos;</span><span class="p">,</span>
        <span class="s1">&apos;start&apos;</span><span class="p">:</span> <span class="s1">&apos;{{ds}}&apos;</span><span class="p">,</span>
        <span class="s1">&apos;partitionType&apos;</span><span class="p">:</span> <span class="s1">&apos;DAY&apos;</span><span class="p">,</span>
        <span class="s1">&apos;labels&apos;</span><span class="p">:</span> <span class="p">{</span><span class="s1">&apos;foo&apos;</span> <span class="p">:</span> <span class="s1">&apos;bar&apos;</span><span class="p">}</span>
    <span class="p">},</span>
    <span class="n">gcp_conn_id</span><span class="o">=</span><span class="s1">&apos;gcp-airflow-service-account&apos;</span><span class="p">,</span>
    <span class="n">dag</span><span class="o">=</span><span class="n">my</span><span class="o">-</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<p>Both <code class="docutils literal notranslate"><span class="pre">jar</span></code> and <code class="docutils literal notranslate"><span class="pre">options</span></code> are templated so you can use variables in them.</p>



<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
    <span class="s1">&apos;depends_on_past&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
    <span class="s1">&apos;start_date&apos;</span><span class="p">:</span>
        <span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
    <span class="s1">&apos;email&apos;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&apos;alex@vanboxel.be&apos;</span><span class="p">],</span>
    <span class="s1">&apos;email_on_failure&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
    <span class="s1">&apos;email_on_retry&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
    <span class="s1">&apos;retries&apos;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
    <span class="s1">&apos;retry_delay&apos;</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">30</span><span class="p">),</span>
    <span class="s1">&apos;dataflow_default_options&apos;</span><span class="p">:</span> <span class="p">{</span>
        <span class="s1">&apos;project&apos;</span><span class="p">:</span> <span class="s1">&apos;my-gcp-project&apos;</span><span class="p">,</span>
        <span class="s1">&apos;zone&apos;</span><span class="p">:</span> <span class="s1">&apos;us-central1-f&apos;</span><span class="p">,</span>
        <span class="s1">&apos;stagingLocation&apos;</span><span class="p">:</span> <span class="s1">&apos;gs://bucket/tmp/dataflow/staging/&apos;</span><span class="p">,</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;test-dag&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>

<span class="n">task</span> <span class="o">=</span> <span class="n">DataFlowJavaOperator</span><span class="p">(</span>
    <span class="n">gcp_conn_id</span><span class="o">=</span><span class="s1">&apos;gcp_default&apos;</span><span class="p">,</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;normalize-cal&apos;</span><span class="p">,</span>
    <span class="n">jar</span><span class="o">=</span><span class="s1">&apos;{{var.value.gcp_dataflow_base}}pipeline-ingress-cal-normalize-1.0.jar&apos;</span><span class="p">,</span>
    <span class="n">options</span><span class="o">=</span><span class="p">{</span>
        <span class="s1">&apos;autoscalingAlgorithm&apos;</span><span class="p">:</span> <span class="s1">&apos;BASIC&apos;</span><span class="p">,</span>
        <span class="s1">&apos;maxNumWorkers&apos;</span><span class="p">:</span> <span class="s1">&apos;50&apos;</span><span class="p">,</span>
        <span class="s1">&apos;start&apos;</span><span class="p">:</span> <span class="s1">&apos;{{ds}}&apos;</span><span class="p">,</span>
        <span class="s1">&apos;partitionType&apos;</span><span class="p">:</span> <span class="s1">&apos;DAY&apos;</span>

    <span class="p">},</span>
    <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="dataflowtemplateoperator">
<span id="id39"></span><h5 class="sigil_not_in_toc">DataflowTemplateOperator</h5>

<pre>
class airflow.contrib.operators.dataflow_operator.DataflowTemplateOperator(template, dataflow_default_options=None, parameters=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, poll_sleep=10, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Templated Cloud DataFlow batch job. The parameters of the operation
will be passed to the job.
It&#x2019;s a good practice to define dataflow_* parameters in the default_args of the dag
like the project, zone and staging location.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/dataflow/docs/reference/rest/v1b3/LaunchTemplateParameters">https://cloud.google.com/dataflow/docs/reference/rest/v1b3/LaunchTemplateParameters</a>
<a class="reference external" href="https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment">https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment</a></p>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s1">&apos;dataflow_default_options&apos;</span><span class="p">:</span> <span class="p">{</span>
        <span class="s1">&apos;project&apos;</span><span class="p">:</span> <span class="s1">&apos;my-gcp-project&apos;</span>
        <span class="s1">&apos;zone&apos;</span><span class="p">:</span> <span class="s1">&apos;europe-west1-d&apos;</span><span class="p">,</span>
        <span class="s1">&apos;tempLocation&apos;</span><span class="p">:</span> <span class="s1">&apos;gs://my-staging-bucket/staging/&apos;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</pre>
</div>
</div>
<p>You need to pass the path to your dataflow template as a file reference with the
<code class="docutils literal notranslate"><span class="pre">template</span></code> parameter. Use <code class="docutils literal notranslate"><span class="pre">parameters</span></code> to pass on parameters to your job.
Use <code class="docutils literal notranslate"><span class="pre">environment</span></code> to pass on runtime environment variables to your job.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">t1</span> <span class="o">=</span> <span class="n">DataflowTemplateOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;datapflow_example&apos;</span><span class="p">,</span>
    <span class="n">template</span><span class="o">=</span><span class="s1">&apos;{{var.value.gcp_dataflow_base}}&apos;</span><span class="p">,</span>
    <span class="n">parameters</span><span class="o">=</span><span class="p">{</span>
        <span class="s1">&apos;inputFile&apos;</span><span class="p">:</span> <span class="s2">&quot;gs://bucket/input/my_input.txt&quot;</span><span class="p">,</span>
        <span class="s1">&apos;outputFile&apos;</span><span class="p">:</span> <span class="s2">&quot;gs://bucket/output/my_output.txt&quot;</span>
    <span class="p">},</span>
    <span class="n">gcp_conn_id</span><span class="o">=</span><span class="s1">&apos;gcp-airflow-service-account&apos;</span><span class="p">,</span>
    <span class="n">dag</span><span class="o">=</span><span class="n">my</span><span class="o">-</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">template</span></code>, <code class="docutils literal notranslate"><span class="pre">dataflow_default_options</span></code> and <code class="docutils literal notranslate"><span class="pre">parameters</span></code> are templated so you can
use variables in them.</p>



</div>
<div class="section" id="dataflowpythonoperator">
<span id="id40"></span><h5 class="sigil_not_in_toc">DataFlowPythonOperator</h5>

<pre>
class airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator(py_file, py_options=None, dataflow_default_options=None, options=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, poll_sleep=10, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>

<pre>
execute(context)</pre>
<p>Execute the python dataflow job.</p>






</div>
</div>
<div class="section" id="dataflowhook">
<h4 class="sigil_not_in_toc">DataFlowHook</h4>

<pre>
class airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook(gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, poll_sleep=10)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook</span></code></a></p>

<pre>
get_conn()</pre>
<p>Returns a Google Cloud Storage service object.</p>






</div>
</div>
<div class="section" id="cloud-dataproc">
<h3 class="sigil_not_in_toc">Cloud DataProc</h3>
<div class="section" id="dataproc-operators">
<h4 class="sigil_not_in_toc">DataProc Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#dataprocclustercreateoperator"><span class="std std-ref">DataprocClusterCreateOperator</span></a> : Create a new cluster on Google Cloud Dataproc.</li>
<li><a class="reference internal" href="#dataprocclusterdeleteoperator"><span class="std std-ref">DataprocClusterDeleteOperator</span></a> : Delete a cluster on Google Cloud Dataproc.</li>
<li><a class="reference internal" href="#dataprocclusterscaleoperator"><span class="std std-ref">DataprocClusterScaleOperator</span></a> : Scale up or down a cluster on Google Cloud Dataproc.</li>
<li><a class="reference internal" href="#dataprocpigoperator"><span class="std std-ref">DataProcPigOperator</span></a> : Start a Pig query Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprochiveoperator"><span class="std std-ref">DataProcHiveOperator</span></a> : Start a Hive query Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprocsparksqloperator"><span class="std std-ref">DataProcSparkSqlOperator</span></a> : Start a Spark SQL query Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprocsparkoperator"><span class="std std-ref">DataProcSparkOperator</span></a> : Start a Spark Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprochadoopoperator"><span class="std std-ref">DataProcHadoopOperator</span></a> : Start a Hadoop Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprocpysparkoperator"><span class="std std-ref">DataProcPySparkOperator</span></a> : Start a PySpark Job on a Cloud DataProc cluster.</li>
<li><a class="reference internal" href="#dataprocworkflowtemplateinstantiateoperator"><span class="std std-ref">DataprocWorkflowTemplateInstantiateOperator</span></a> : Instantiate a WorkflowTemplate on Google Cloud Dataproc.</li>
<li><a class="reference internal" href="#dataprocworkflowtemplateinstantiateinlineoperator"><span class="std std-ref">DataprocWorkflowTemplateInstantiateInlineOperator</span></a> : Instantiate a WorkflowTemplate Inline on Google Cloud Dataproc.</li>
</ul>
<div class="section" id="dataprocclustercreateoperator">
<span id="id41"></span><h5 class="sigil_not_in_toc">DataprocClusterCreateOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataprocClusterCreateOperator(cluster_name, project_id, num_workers, zone, network_uri=None, subnetwork_uri=None, internal_ip_only=None, tags=None, storage_bucket=None, init_actions_uris=None, init_action_timeout=&apos;10m&apos;, metadata=None, image_version=None, properties=None, master_machine_type=&apos;n1-standard-4&apos;, master_disk_size=500, worker_machine_type=&apos;n1-standard-4&apos;, worker_disk_size=500, num_preemptible_workers=0, labels=None, region=&apos;global&apos;, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, service_account=None, service_account_scopes=None, idle_delete_ttl=None, auto_delete_time=None, auto_delete_ttl=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Create a new cluster on Google Cloud Dataproc. The operator will wait until the
creation is successful or an error occurs in the creation process.</p>
<p>The parameters allow to configure the cluster. Please refer to</p>
<p><a class="reference external" href="https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters">https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters</a></p>
<p>for a detailed explanation on the different parameters. Most of the configuration
parameters detailed in the link are available as a parameter to this operator.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster to create. (templated)</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the google cloud project in which
to create the cluster. (templated)</li>
<li><strong>num_workers</strong> (<em>int</em>) &#x2013; The # of workers to spin up</li>
<li><strong>storage_bucket</strong> (<em>string</em>) &#x2013; The storage bucket to use, setting to None lets dataproc
generate a custom one for you</li>
<li><strong>init_actions_uris</strong> (<em>list</em><em>[</em><em>string</em><em>]</em>) &#x2013; List of GCS uri&#x2019;s containing
dataproc initialization scripts</li>
<li><strong>init_action_timeout</strong> (<em>string</em>) &#x2013; Amount of time executable scripts in
init_actions_uris has to complete</li>
<li><strong>metadata</strong> (<em>dict</em>) &#x2013; dict of key-value google compute engine metadata entries
to add to all instances</li>
<li><strong>image_version</strong> (<em>string</em>) &#x2013; the version of software inside the Dataproc cluster</li>
<li><strong>properties</strong> (<em>dict</em>) &#x2013; dict of properties to set on
config files (e.g. spark-defaults.conf), see
<a class="reference external" href="https://cloud.google.com/dataproc/docs/reference/rest/v1/">https://cloud.google.com/dataproc/docs/reference/rest/v1/</a>         projects.regions.clusters#SoftwareConfig</li>
<li><strong>master_machine_type</strong> (<em>string</em>) &#x2013; Compute engine machine type to use for the master node</li>
<li><strong>master_disk_size</strong> (<em>int</em>) &#x2013; Disk size for the master node</li>
<li><strong>worker_machine_type</strong> (<em>string</em>) &#x2013; Compute engine machine type to use for the worker nodes</li>
<li><strong>worker_disk_size</strong> (<em>int</em>) &#x2013; Disk size for the worker nodes</li>
<li><strong>num_preemptible_workers</strong> (<em>int</em>) &#x2013; The # of preemptible worker nodes to spin up</li>
<li><strong>labels</strong> (<em>dict</em>) &#x2013; dict of labels to add to the cluster</li>
<li><strong>zone</strong> (<em>string</em>) &#x2013; The zone where the cluster will be located. (templated)</li>
<li><strong>network_uri</strong> (<em>string</em>) &#x2013; The network uri to be used for machine communication, cannot be
specified with subnetwork_uri</li>
<li><strong>subnetwork_uri</strong> (<em>string</em>) &#x2013; The subnetwork uri to be used for machine communication,
cannot be specified with network_uri</li>
<li><strong>internal_ip_only</strong> (<em>bool</em>) &#x2013; If true, all instances in the cluster will only
have internal IP addresses. This can only be enabled for subnetwork
enabled networks</li>
<li><strong>tags</strong> (<em>list</em><em>[</em><em>string</em><em>]</em>) &#x2013; The GCE tags to add to all instances</li>
<li><strong>region</strong> &#x2013; leave as &#x2018;global&#x2019;, might become relevant in the future. (templated)</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>service_account</strong> (<em>string</em>) &#x2013; The service account of the dataproc instances.</li>
<li><strong>service_account_scopes</strong> (<em>list</em><em>[</em><em>string</em><em>]</em>) &#x2013; The URIs of service account scopes to be included.</li>
<li><strong>idle_delete_ttl</strong> (<em>int</em>) &#x2013; The longest duration that cluster would keep alive while
staying idle. Passing this threshold will cause cluster to be auto-deleted.
A duration in seconds.</li>
<li><strong>auto_delete_time</strong> (<em>datetime.datetime</em>) &#x2013; The time when cluster will be auto-deleted.</li>
<li><strong>auto_delete_ttl</strong> (<em>int</em>) &#x2013; The life duration of cluster, the cluster will be
auto-deleted at the end of this duration.
A duration in seconds. (If auto_delete_time is set this parameter will be ignored)</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocclusterscaleoperator">
<span id="id42"></span><h5 class="sigil_not_in_toc">DataprocClusterScaleOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataprocClusterScaleOperator(cluster_name, project_id, region=&apos;global&apos;, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, num_workers=2, num_preemptible_workers=0, graceful_decommission_timeout=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Scale, up or down, a cluster on Google Cloud Dataproc.
The operator will wait until the cluster is re-scaled.</p>
<p><strong>Example</strong>:</p>

<pre>t1 = DataprocClusterScaleOperator(</pre>
task_id=&#x2019;dataproc_scale&#x2019;,
project_id=&#x2019;my-project&#x2019;,
cluster_name=&#x2019;cluster-1&#x2019;,
num_workers=10,
num_preemptible_workers=10,
graceful_decommission_timeout=&#x2018;1h&#x2019;
dag=dag)

<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more detail on about scaling clusters have a look at the reference:
<a class="reference external" href="https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/scaling-clusters">https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/scaling-clusters</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the cluster to scale. (templated)</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the google cloud project in which
the cluster runs. (templated)</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The region for the dataproc cluster. (templated)</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>num_workers</strong> (<em>int</em>) &#x2013; The new number of workers</li>
<li><strong>num_preemptible_workers</strong> (<em>int</em>) &#x2013; The new number of preemptible workers</li>
<li><strong>graceful_decommission_timeout</strong> (<em>string</em>) &#x2013; Timeout for graceful YARN decomissioning.
Maximum value is 1d</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocclusterdeleteoperator">
<span id="id43"></span><h5 class="sigil_not_in_toc">DataprocClusterDeleteOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataprocClusterDeleteOperator(cluster_name, project_id, region=&apos;global&apos;, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Delete a cluster on Google Cloud Dataproc. The operator will wait until the
cluster is destroyed.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the cluster to create. (templated)</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the google cloud project in which
the cluster runs. (templated)</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; leave as &#x2018;global&#x2019;, might become relevant in the future. (templated)</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocpigoperator">
<span id="id44"></span><h5 class="sigil_not_in_toc">DataProcPigOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcPigOperator(query=None, query_uri=None, variables=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_pig_properties=None, dataproc_pig_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Pig query Job on a Cloud DataProc cluster. The parameters of the operation
will be passed to the cluster.</p>
<p>It&#x2019;s a good practice to define dataproc_* parameters in the default_args of the dag
like the cluster name and UDFs.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s1">&apos;cluster_name&apos;</span><span class="p">:</span> <span class="s1">&apos;cluster-1&apos;</span><span class="p">,</span>
    <span class="s1">&apos;dataproc_pig_jars&apos;</span><span class="p">:</span> <span class="p">[</span>
        <span class="s1">&apos;gs://example/udf/jar/datafu/1.2.0/datafu.jar&apos;</span><span class="p">,</span>
        <span class="s1">&apos;gs://example/udf/jar/gpig/1.2/gpig.jar&apos;</span>
    <span class="p">]</span>
<span class="p">}</span>
</pre>
</div>
</div>
<p>You can pass a pig script as string or file reference. Use variables to pass on
variables for the pig script to be resolved on the cluster or use the parameters to
be resolved in the script as template parameters.</p>
<p><strong>Example</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">t1</span> <span class="o">=</span> <span class="n">DataProcPigOperator</span><span class="p">(</span>
        <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dataproc_pig&apos;</span><span class="p">,</span>
        <span class="n">query</span><span class="o">=</span><span class="s1">&apos;a_pig_script.pig&apos;</span><span class="p">,</span>
        <span class="n">variables</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;out&apos;</span><span class="p">:</span> <span class="s1">&apos;gs://example/output/{{ds}}&apos;</span><span class="p">},</span>
        <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more detail on about job submission have a look at the reference:
<a class="reference external" href="https://cloud.google.com/dataproc/reference/rest/v1/projects.regions.jobs">https://cloud.google.com/dataproc/reference/rest/v1/projects.regions.jobs</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>query</strong> (<em>string</em>) &#x2013; The query or reference to the query
file (pg or pig extension). (templated)</li>
<li><strong>query_uri</strong> (<em>string</em>) &#x2013; The uri of a pig script on Cloud Storage.</li>
<li><strong>variables</strong> (<em>dict</em>) &#x2013; Map of named parameters for the query. (templated)</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This
name by default is the task_id appended with the execution data, but can
be templated. The name will always be appended with a random number to
avoid name clashes. (templated)</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster. (templated)</li>
<li><strong>dataproc_pig_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_pig_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example: for
UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprochiveoperator">
<span id="id45"></span><h5 class="sigil_not_in_toc">DataProcHiveOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcHiveOperator(query=None, query_uri=None, variables=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_hive_properties=None, dataproc_hive_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Hive query Job on a Cloud DataProc cluster.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>query</strong> (<em>string</em>) &#x2013; The query or reference to the query file (q extension).</li>
<li><strong>query_uri</strong> (<em>string</em>) &#x2013; The uri of a hive script on Cloud Storage.</li>
<li><strong>variables</strong> (<em>dict</em>) &#x2013; Map of named parameters for the query.</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This name by default
is the task_id appended with the execution data, but can be templated. The
name will always be appended with a random number to avoid name clashes.</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster.</li>
<li><strong>dataproc_hive_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_hive_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example: for
UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocsparksqloperator">
<span id="id46"></span><h5 class="sigil_not_in_toc">DataProcSparkSqlOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcSparkSqlOperator(query=None, query_uri=None, variables=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_spark_properties=None, dataproc_spark_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Spark SQL query Job on a Cloud DataProc cluster.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>query</strong> (<em>string</em>) &#x2013; The query or reference to the query file (q extension). (templated)</li>
<li><strong>query_uri</strong> (<em>string</em>) &#x2013; The uri of a spark sql script on Cloud Storage.</li>
<li><strong>variables</strong> (<em>dict</em>) &#x2013; Map of named parameters for the query. (templated)</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This
name by default is the task_id appended with the execution data, but can
be templated. The name will always be appended with a random number to
avoid name clashes. (templated)</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster. (templated)</li>
<li><strong>dataproc_spark_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_spark_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example:
for UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocsparkoperator">
<span id="id47"></span><h5 class="sigil_not_in_toc">DataProcSparkOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcSparkOperator(main_jar=None, main_class=None, arguments=None, archives=None, files=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_spark_properties=None, dataproc_spark_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Spark Job on a Cloud DataProc cluster.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>main_jar</strong> (<em>string</em>) &#x2013; URI of the job jar provisioned on Cloud Storage. (use this or
the main_class, not both together).</li>
<li><strong>main_class</strong> (<em>string</em>) &#x2013; Name of the job class. (use this or the main_jar, not both
together).</li>
<li><strong>arguments</strong> (<em>list</em>) &#x2013; Arguments for the job. (templated)</li>
<li><strong>archives</strong> (<em>list</em>) &#x2013; List of archived files that will be unpacked in the work
directory. Should be stored in Cloud Storage.</li>
<li><strong>files</strong> (<em>list</em>) &#x2013; List of files to be copied to the working directory</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This
name by default is the task_id appended with the execution data, but can
be templated. The name will always be appended with a random number to
avoid name clashes. (templated)</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster. (templated)</li>
<li><strong>dataproc_spark_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_spark_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example:
for UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprochadoopoperator">
<span id="id48"></span><h5 class="sigil_not_in_toc">DataProcHadoopOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcHadoopOperator(main_jar=None, main_class=None, arguments=None, archives=None, files=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_hadoop_properties=None, dataproc_hadoop_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Hadoop Job on a Cloud DataProc cluster.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>main_jar</strong> (<em>string</em>) &#x2013; URI of the job jar provisioned on Cloud Storage. (use this or
the main_class, not both together).</li>
<li><strong>main_class</strong> (<em>string</em>) &#x2013; Name of the job class. (use this or the main_jar, not both
together).</li>
<li><strong>arguments</strong> (<em>list</em>) &#x2013; Arguments for the job. (templated)</li>
<li><strong>archives</strong> (<em>list</em>) &#x2013; List of archived files that will be unpacked in the work
directory. Should be stored in Cloud Storage.</li>
<li><strong>files</strong> (<em>list</em>) &#x2013; List of files to be copied to the working directory</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This
name by default is the task_id appended with the execution data, but can
be templated. The name will always be appended with a random number to
avoid name clashes. (templated)</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster. (templated)</li>
<li><strong>dataproc_hadoop_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_hadoop_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example:
for UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocpysparkoperator">
<span id="id49"></span><h5 class="sigil_not_in_toc">DataProcPySparkOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataProcPySparkOperator(main, arguments=None, archives=None, pyfiles=None, files=None, job_name=&apos;{{task.task_id}}_{{ds_nodash}}&apos;, cluster_name=&apos;cluster-1&apos;, dataproc_pyspark_properties=None, dataproc_pyspark_jars=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, region=&apos;global&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a PySpark Job on a Cloud DataProc cluster.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>main</strong> (<em>string</em>) &#x2013; [Required] The Hadoop Compatible Filesystem (HCFS) URI of the main
Python file to use as the driver. Must be a .py file.</li>
<li><strong>arguments</strong> (<em>list</em>) &#x2013; Arguments for the job. (templated)</li>
<li><strong>archives</strong> (<em>list</em>) &#x2013; List of archived files that will be unpacked in the work
directory. Should be stored in Cloud Storage.</li>
<li><strong>files</strong> (<em>list</em>) &#x2013; List of files to be copied to the working directory</li>
<li><strong>pyfiles</strong> (<em>list</em>) &#x2013; List of Python files to pass to the PySpark framework.
Supported file types: .py, .egg, and .zip</li>
<li><strong>job_name</strong> (<em>string</em>) &#x2013; The job name used in the DataProc cluster. This
name by default is the task_id appended with the execution data, but can
be templated. The name will always be appended with a random number to
avoid name clashes. (templated)</li>
<li><strong>cluster_name</strong> (<em>string</em>) &#x2013; The name of the DataProc cluster.</li>
<li><strong>dataproc_pyspark_properties</strong> (<em>dict</em>) &#x2013; Map for the Pig properties. Ideal to put in
default arguments</li>
<li><strong>dataproc_pyspark_jars</strong> (<em>list</em>) &#x2013; URIs to jars provisioned in Cloud Storage (example:
for UDFs and libs) and are ideal to put in default arguments.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The specified region where the dataproc cluster is created.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocworkflowtemplateinstantiateoperator">
<span id="id50"></span><h5 class="sigil_not_in_toc">DataprocWorkflowTemplateInstantiateOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateOperator(template_id, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator" title="airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator</span></code></a></p>
<p>Instantiate a WorkflowTemplate on Google Cloud Dataproc. The operator will wait
until the WorkflowTemplate is finished executing.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">Please refer to:
<a class="reference external" href="https://cloud.google.com/dataproc/docs/reference/rest/v1beta2/projects.regions.workflowTemplates/instantiate">https://cloud.google.com/dataproc/docs/reference/rest/v1beta2/projects.regions.workflowTemplates/instantiate</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>template_id</strong> (<em>string</em>) &#x2013; The id of the template. (templated)</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the google cloud project in which
the template runs</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; leave as &#x2018;global&#x2019;, might become relevant in the future</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="dataprocworkflowtemplateinstantiateinlineoperator">
<span id="id51"></span><h5 class="sigil_not_in_toc">DataprocWorkflowTemplateInstantiateInlineOperator</h5>

<pre>
class airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateInstantiateInlineOperator(template, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator" title="airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.operators.dataproc_operator.DataprocWorkflowTemplateBaseOperator</span></code></a></p>
<p>Instantiate a WorkflowTemplate Inline on Google Cloud Dataproc. The operator will
wait until the WorkflowTemplate is finished executing.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">Please refer to:
<a class="reference external" href="https://cloud.google.com/dataproc/docs/reference/rest/v1beta2/projects.regions.workflowTemplates/instantiateInline">https://cloud.google.com/dataproc/docs/reference/rest/v1beta2/projects.regions.workflowTemplates/instantiateInline</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>template</strong> (<em>map</em>) &#x2013; The template contents. (templated)</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the google cloud project in which
the template runs</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; leave as &#x2018;global&#x2019;, might become relevant in the future</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use connecting to Google Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
</div>
<div class="section" id="cloud-datastore">
<h3 class="sigil_not_in_toc">Cloud Datastore</h3>
<div class="section" id="datastore-operators">
<h4 class="sigil_not_in_toc">Datastore Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#datastoreexportoperator"><span class="std std-ref">DatastoreExportOperator</span></a> : Export entities from Google Cloud Datastore to Cloud Storage.</li>
<li><a class="reference internal" href="#datastoreimportoperator"><span class="std std-ref">DatastoreImportOperator</span></a> : Import entities from Cloud Storage to Google Cloud Datastore.</li>
</ul>
<div class="section" id="datastoreexportoperator">
<span id="id52"></span><h5 class="sigil_not_in_toc">DatastoreExportOperator</h5>

<pre>
class airflow.contrib.operators.datastore_export_operator.DatastoreExportOperator(bucket, namespace=None, datastore_conn_id=&apos;google_cloud_default&apos;, cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, entity_filter=None, labels=None, polling_interval_in_seconds=10, overwrite_existing=False, xcom_push=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Export entities from Google Cloud Datastore to Cloud Storage</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; name of the cloud storage bucket to backup data</li>
<li><strong>namespace</strong> (<em>str</em>) &#x2013; optional namespace path in the specified Cloud Storage bucket
to backup data. If this namespace does not exist in GCS, it will be created.</li>
<li><strong>datastore_conn_id</strong> (<em>string</em>) &#x2013; the name of the Datastore connection id to use</li>
<li><strong>cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; the name of the cloud storage connection id to
force-write backup</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>entity_filter</strong> (<em>dict</em>) &#x2013; description of what data from the project is included in the
export, refer to
<a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/Shared.Types/EntityFilter">https://cloud.google.com/datastore/docs/reference/rest/Shared.Types/EntityFilter</a></li>
<li><strong>labels</strong> (<em>dict</em>) &#x2013; client-assigned labels for cloud storage</li>
<li><strong>polling_interval_in_seconds</strong> (<em>int</em>) &#x2013; number of seconds to wait before polling for
execution status again</li>
<li><strong>overwrite_existing</strong> (<em>bool</em>) &#x2013; if the storage bucket + namespace is not empty, it will be
emptied prior to exports. This enables overwriting existing backups.</li>
<li><strong>xcom_push</strong> (<em>bool</em>) &#x2013; push operation name to xcom for reference</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="datastoreimportoperator">
<span id="id53"></span><h5 class="sigil_not_in_toc">DatastoreImportOperator</h5>

<pre>
class airflow.contrib.operators.datastore_import_operator.DatastoreImportOperator(bucket, file, namespace=None, entity_filter=None, labels=None, datastore_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, polling_interval_in_seconds=10, xcom_push=False, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Import entities from Cloud Storage to Google Cloud Datastore</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; container in Cloud Storage to store data</li>
<li><strong>file</strong> (<em>string</em>) &#x2013; path of the backup metadata file in the specified Cloud Storage bucket.
It should have the extension .overall_export_metadata</li>
<li><strong>namespace</strong> (<em>str</em>) &#x2013; optional namespace of the backup metadata file in
the specified Cloud Storage bucket.</li>
<li><strong>entity_filter</strong> (<em>dict</em>) &#x2013; description of what data from the project is included in
the export, refer to
<a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/Shared.Types/EntityFilter">https://cloud.google.com/datastore/docs/reference/rest/Shared.Types/EntityFilter</a></li>
<li><strong>labels</strong> (<em>dict</em>) &#x2013; client-assigned labels for cloud storage</li>
<li><strong>datastore_conn_id</strong> (<em>string</em>) &#x2013; the name of the connection id to use</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>polling_interval_in_seconds</strong> (<em>int</em>) &#x2013; number of seconds to wait before polling for
execution status again</li>
<li><strong>xcom_push</strong> (<em>bool</em>) &#x2013; push operation name to xcom for reference</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="datastorehook">
<h4 class="sigil_not_in_toc">DatastoreHook</h4>

<pre>
class airflow.contrib.hooks.datastore_hook.DatastoreHook(datastore_conn_id=&apos;google_cloud_datastore_default&apos;, delegate_to=None)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook</span></code></a></p>
<p>Interact with Google Cloud Datastore. This hook uses the Google Cloud Platform
connection.</p>
<p>This object is not threads safe. If you want to make multiple requests
simultaneously, you will need to create a hook per thread.</p>

<pre>
allocate_ids(partialKeys)</pre>
<p>Allocate IDs for incomplete keys.
see <a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/allocateIds">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/allocateIds</a></p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>partialKeys</strong> &#x2013; a list of partial keys</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body">a list of full keys.</td>
</tr>
</tbody>
</table>




<pre>
begin_transaction()</pre>
<p>Get a new transaction handle</p>
<blockquote>
<div><div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/beginTransaction">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/beginTransaction</a></p>
</div>
</div>
</blockquote>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Returns:</th>
<td class="field-body">a transaction handle</td>
</tr>
</tbody>
</table>




<pre>
commit(body)</pre>
<p>Commit a transaction, optionally creating, deleting or modifying some entities.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/commit">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/commit</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>body</strong> &#x2013; the body of the commit request</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body">the response body of the commit request</td>
</tr>
</tbody>
</table>




<pre>
delete_operation(name)</pre>
<p>Deletes the long-running operation</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>name</strong> &#x2013; the name of the operation resource</td>
</tr>
</tbody>
</table>




<pre>
export_to_storage_bucket(bucket, namespace=None, entity_filter=None, labels=None)</pre>
<p>Export entities from Cloud Datastore to Cloud Storage for backup</p>




<pre>
get_conn(version=&apos;v1&apos;)</pre>
<p>Returns a Google Cloud Storage service object.</p>




<pre>
get_operation(name)</pre>
<p>Gets the latest state of a long-running operation</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>name</strong> &#x2013; the name of the operation resource</td>
</tr>
</tbody>
</table>




<pre>
import_from_storage_bucket(bucket, file, namespace=None, entity_filter=None, labels=None)</pre>
<p>Import a backup from Cloud Storage to Cloud Datastore</p>




<pre>
lookup(keys, read_consistency=None, transaction=None)</pre>
<p>Lookup some entities by key</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/lookup">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/lookup</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>keys</strong> &#x2013; the keys to lookup</li>
<li><strong>read_consistency</strong> &#x2013; the read consistency to use. default, strong or eventual.
Cannot be used with a transaction.</li>
<li><strong>transaction</strong> &#x2013; the transaction to use, if any.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">the response body of the lookup request.</p>
</td>
</tr>
</tbody>
</table>




<pre>
poll_operation_until_done(name, polling_interval_in_seconds)</pre>
<p>Poll backup operation state until it&#x2019;s completed</p>




<pre>
rollback(transaction)</pre>
<p>Roll back a transaction</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/rollback">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/rollback</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>transaction</strong> &#x2013; the transaction to roll back</td>
</tr>
</tbody>
</table>




<pre>
run_query(body)</pre>
<p>Run a query for entities.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://cloud.google.com/datastore/docs/reference/rest/v1/projects/runQuery">https://cloud.google.com/datastore/docs/reference/rest/v1/projects/runQuery</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>body</strong> &#x2013; the body of the query request</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body">the batch of query results.</td>
</tr>
</tbody>
</table>






</div>
</div>
<div class="section" id="cloud-ml-engine">
<h3 class="sigil_not_in_toc">Cloud ML Engine</h3>
<div class="section" id="cloud-ml-engine-operators">
<h4 class="sigil_not_in_toc">Cloud ML Engine Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#mlenginebatchpredictionoperator"><span class="std std-ref">MLEngineBatchPredictionOperator</span></a> : Start a Cloud ML Engine batch prediction job.</li>
<li><a class="reference internal" href="#mlenginemodeloperator"><span class="std std-ref">MLEngineModelOperator</span></a> : Manages a Cloud ML Engine model.</li>
<li><a class="reference internal" href="#mlenginetrainingoperator"><span class="std std-ref">MLEngineTrainingOperator</span></a> : Start a Cloud ML Engine training job.</li>
<li><a class="reference internal" href="#mlengineversionoperator"><span class="std std-ref">MLEngineVersionOperator</span></a> : Manages a Cloud ML Engine model version.</li>
</ul>
<div class="section" id="mlenginebatchpredictionoperator">
<span id="id54"></span><h5 class="sigil_not_in_toc">MLEngineBatchPredictionOperator</h5>

<pre>
class airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator(project_id, job_id, region, data_format, input_paths, output_path, model_name=None, version_name=None, uri=None, max_worker_count=None, runtime_version=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Start a Google Cloud ML Engine prediction job.</p>
<p>NOTE: For model origin, users should consider exactly one from the
three options below:
1. Populate &#x2018;uri&#x2019; field only, which should be a GCS location that
points to a tensorflow savedModel directory.
2. Populate &#x2018;model_name&#x2019; field only, which refers to an existing
model, and the default version of the model will be used.
3. Populate both &#x2018;model_name&#x2019; and &#x2018;version_name&#x2019; fields, which
refers to a specific version of a specific model.</p>
<p>In options 2 and 3, both model and version name should contain the
minimal identifier. For instance, call</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">MLEngineBatchPredictionOperator</span><span class="p">(</span>
    <span class="o">...</span><span class="p">,</span>
    <span class="n">model_name</span><span class="o">=</span><span class="s1">&apos;my_model&apos;</span><span class="p">,</span>
    <span class="n">version_name</span><span class="o">=</span><span class="s1">&apos;my_version&apos;</span><span class="p">,</span>
    <span class="o">...</span><span class="p">)</span>
</pre>
</div>
</div>
<p>if the desired model version is
&#x201C;projects/my_project/models/my_model/versions/my_version&#x201D;.</p>
<p>See <a class="reference external" href="https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs">https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs</a>
for further documentation on the parameters.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google Cloud project name where the
prediction job is submitted. (templated)</li>
<li><strong>job_id</strong> (<em>string</em>) &#x2013; A unique id for the prediction job on Google Cloud
ML Engine. (templated)</li>
<li><strong>data_format</strong> (<em>string</em>) &#x2013; The format of the input data.
It will default to &#x2018;DATA_FORMAT_UNSPECIFIED&#x2019; if is not provided
or is not one of [&#x201C;TEXT&#x201D;, &#x201C;TF_RECORD&#x201D;, &#x201C;TF_RECORD_GZIP&#x201D;].</li>
<li><strong>input_paths</strong> (<em>list of string</em>) &#x2013; A list of GCS paths of input data for batch
prediction. Accepting wildcard operator <a href="#id55"><span class="problematic" id="id56">*</span></a>, but only at the end. (templated)</li>
<li><strong>output_path</strong> (<em>string</em>) &#x2013; The GCS path where the prediction results are
written to. (templated)</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The Google Compute Engine region to run the
prediction job in. (templated)</li>
<li><strong>model_name</strong> (<em>string</em>) &#x2013; The Google Cloud ML Engine model to use for prediction.
If version_name is not provided, the default version of this
model will be used.
Should not be None if version_name is provided.
Should be None if uri is provided. (templated)</li>
<li><strong>version_name</strong> (<em>string</em>) &#x2013; The Google Cloud ML Engine model version to use for
prediction.
Should be None if uri is provided. (templated)</li>
<li><strong>uri</strong> (<em>string</em>) &#x2013; The GCS path of the saved model to use for prediction.
Should be None if model_name is provided.
It should be a GCS path pointing to a tensorflow SavedModel. (templated)</li>
<li><strong>max_worker_count</strong> (<em>int</em>) &#x2013; The maximum number of workers to be used
for parallel processing. Defaults to 10 if not specified.</li>
<li><strong>runtime_version</strong> (<em>string</em>) &#x2013; The Google Cloud ML Engine runtime version to use
for batch prediction.</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID used for connection to Google
Cloud Platform.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must
have doamin-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>Raises:</pre>
<code class="docutils literal notranslate"><span class="pre">ValueError</span></code>: if a unique model/version origin cannot be determined.




</div>
<div class="section" id="mlenginemodeloperator">
<span id="id57"></span><h5 class="sigil_not_in_toc">MLEngineModelOperator</h5>

<pre>
class airflow.contrib.operators.mlengine_operator.MLEngineModelOperator(project_id, model, operation=&apos;create&apos;, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Operator for managing a Google Cloud ML Engine model.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google Cloud project name to which MLEngine
model belongs. (templated)</li>
<li><strong>model</strong> (<em>dict</em>) &#x2013; <p>A dictionary containing the information about the model.
If the <cite>operation</cite> is <cite>create</cite>, then the <cite>model</cite> parameter should
contain all the information about this model such as <cite>name</cite>.</p>
<p>If the <cite>operation</cite> is <cite>get</cite>, the <cite>model</cite> parameter
should contain the <cite>name</cite> of the model.</p>
</li>
<li><strong>operation</strong> &#x2013; <p>The operation to perform. Available operations are:</p>
<ul>
<li><code class="docutils literal notranslate"><span class="pre">create</span></code>: Creates a new model as provided by the <cite>model</cite> parameter.</li>
<li><code class="docutils literal notranslate"><span class="pre">get</span></code>: Gets a particular model where the name is specified in <cite>model</cite>.</li>
</ul>
</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when fetching connection info.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="mlenginetrainingoperator">
<span id="id58"></span><h5 class="sigil_not_in_toc">MLEngineTrainingOperator</h5>

<pre>
class airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator(project_id, job_id, package_uris, training_python_module, training_args, region, scale_tier=None, runtime_version=None, python_version=None, job_dir=None, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, mode=&apos;PRODUCTION&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Operator for launching a MLEngine training job.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google Cloud project name within which MLEngine
training job should run (templated).</li>
<li><strong>job_id</strong> (<em>string</em>) &#x2013; A unique templated id for the submitted Google MLEngine
training job. (templated)</li>
<li><strong>package_uris</strong> (<em>string</em>) &#x2013; A list of package locations for MLEngine training job,
which should include the main training program + any additional
dependencies. (templated)</li>
<li><strong>training_python_module</strong> (<em>string</em>) &#x2013; The Python module name to run within MLEngine
training job after installing &#x2018;package_uris&#x2019; packages. (templated)</li>
<li><strong>training_args</strong> (<em>string</em>) &#x2013; A list of templated command line arguments to pass to
the MLEngine training program. (templated)</li>
<li><strong>region</strong> (<em>string</em>) &#x2013; The Google Compute Engine region to run the MLEngine training
job in (templated).</li>
<li><strong>scale_tier</strong> (<em>string</em>) &#x2013; Resource tier for MLEngine training job. (templated)</li>
<li><strong>runtime_version</strong> (<em>string</em>) &#x2013; The Google Cloud ML runtime version to use for
training. (templated)</li>
<li><strong>python_version</strong> (<em>string</em>) &#x2013; The version of Python used in training. (templated)</li>
<li><strong>job_dir</strong> (<em>string</em>) &#x2013; A Google Cloud Storage path in which to store training
outputs and other data needed for training. (templated)</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when fetching connection info.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
<li><strong>mode</strong> (<em>string</em>) &#x2013; Can be one of &#x2018;DRY_RUN&#x2019;/&#x2019;CLOUD&#x2019;. In &#x2018;DRY_RUN&#x2019; mode, no real
training job will be launched, but the MLEngine training job request
will be printed out. In &#x2018;CLOUD&#x2019; mode, a real MLEngine training job
creation request will be issued.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="mlengineversionoperator">
<span id="id59"></span><h5 class="sigil_not_in_toc">MLEngineVersionOperator</h5>

<pre>
class airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator(project_id, model_name, version_name=None, version=None, operation=&apos;create&apos;, gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Operator for managing a Google Cloud ML Engine version.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google Cloud project name to which MLEngine
model belongs.</li>
<li><strong>model_name</strong> (<em>string</em>) &#x2013; The name of the Google Cloud ML Engine model that the version
belongs to. (templated)</li>
<li><strong>version_name</strong> (<em>string</em>) &#x2013; A name to use for the version being operated upon.
If not None and the <cite>version</cite> argument is None or does not have a value for
the <cite>name</cite> key, then this will be populated in the payload for the
<cite>name</cite> key. (templated)</li>
<li><strong>version</strong> (<em>dict</em>) &#x2013; A dictionary containing the information about the version.
If the <cite>operation</cite> is <cite>create</cite>, <cite>version</cite> should contain all the
information about this version such as name, and deploymentUrl.
If the <cite>operation</cite> is <cite>get</cite> or <cite>delete</cite>, the <cite>version</cite> parameter
should contain the <cite>name</cite> of the version.
If it is None, the only <cite>operation</cite> possible would be <cite>list</cite>. (templated)</li>
<li><strong>operation</strong> (<em>string</em>) &#x2013; <p>The operation to perform. Available operations are:</p>
<ul>
<li><code class="docutils literal notranslate"><span class="pre">create</span></code>: Creates a new version in the model specified by <cite>model_name</cite>,
in which case the <cite>version</cite> parameter should contain all the
information to create that version
(e.g. <cite>name</cite>, <cite>deploymentUrl</cite>).</li>
<li><code class="docutils literal notranslate"><span class="pre">get</span></code>: Gets full information of a particular version in the model
specified by <cite>model_name</cite>.
The name of the version should be specified in the <cite>version</cite>
parameter.</li>
<li><code class="docutils literal notranslate"><span class="pre">list</span></code>: Lists all available versions of the model specified
by <cite>model_name</cite>.</li>
<li><code class="docutils literal notranslate"><span class="pre">delete</span></code>: Deletes the version specified in <cite>version</cite> parameter from the
model specified by <cite>model_name</cite>).
The name of the version should be specified in the <cite>version</cite>
parameter.</li>
</ul>
</li>
<li><strong>gcp_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when fetching connection info.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
</div>
<div class="section" id="cloud-ml-engine-hook">
<h4 class="sigil_not_in_toc">Cloud ML Engine Hook</h4>
<div class="section" id="mlenginehook">
<span id="id60"></span><h5 class="sigil_not_in_toc">MLEngineHook</h5>

<pre>
class airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook(gcp_conn_id=&apos;google_cloud_default&apos;, delegate_to=None)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook</span></code></a></p>

<pre>
create_job(project_id, job, use_existing_job_fn=None)</pre>
<p>Launches a MLEngine job and wait for it to reach a terminal state.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The Google Cloud project id within which MLEngine
job will be launched.</li>
<li><strong>job</strong> (<em>dict</em>) &#x2013; <p>MLEngine Job object that should be provided to the MLEngine
API, such as:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">{</span>
  <span class="s1">&apos;jobId&apos;</span><span class="p">:</span> <span class="s1">&apos;my_job_id&apos;</span><span class="p">,</span>
  <span class="s1">&apos;trainingInput&apos;</span><span class="p">:</span> <span class="p">{</span>
    <span class="s1">&apos;scaleTier&apos;</span><span class="p">:</span> <span class="s1">&apos;STANDARD_1&apos;</span><span class="p">,</span>
    <span class="o">...</span>
  <span class="p">}</span>
<span class="p">}</span>
</pre>
</div>
</div>
</li>
<li><strong>use_existing_job_fn</strong> (<em>function</em>) &#x2013; In case that a MLEngine job with the same
job_id already exist, this method (if provided) will decide whether
we should use this existing job, continue waiting for it to finish
and returning the job object. It should accepts a MLEngine job
object, and returns a boolean value indicating whether it is OK to
reuse the existing job. If &#x2018;use_existing_job_fn&#x2019; is not provided,
we by default reuse the existing MLEngine job.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first">The MLEngine job object if the job successfully reach a
terminal state (which might be FAILED or CANCELLED state).</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th>
<td class="field-body"><p class="first last">dict</p>
</td>
</tr>
</tbody>
</table>




<pre>
create_model(project_id, model)</pre>
<p>Create a Model. Blocks until finished.</p>




<pre>
create_version(project_id, model_name, version_spec)</pre>
<p>Creates the Version on Google Cloud ML Engine.</p>
<p>Returns the operation if the version was created successfully and
raises an error otherwise.</p>




<pre>
delete_version(project_id, model_name, version_name)</pre>
<p>Deletes the given version of a model. Blocks until finished.</p>




<pre>
get_conn()</pre>
<p>Returns a Google MLEngine service object.</p>




<pre>
get_model(project_id, model_name)</pre>
<p>Gets a Model. Blocks until finished.</p>




<pre>
list_versions(project_id, model_name)</pre>
<p>Lists all available versions of a model. Blocks until finished.</p>




<pre>
set_default_version(project_id, model_name, version_name)</pre>
<p>Sets a version to be the default. Blocks until finished.</p>






</div>
</div>
</div>
<div class="section" id="cloud-storage">
<h3 class="sigil_not_in_toc">Cloud Storage</h3>
<div class="section" id="storage-operators">
<h4 class="sigil_not_in_toc">Storage Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#filetogooglecloudstorageoperator"><span class="std std-ref">FileToGoogleCloudStorageOperator</span></a> : Uploads a file to Google Cloud Storage.</li>
<li><a class="reference internal" href="#googlecloudstoragecreatebucketoperator"><span class="std std-ref">GoogleCloudStorageCreateBucketOperator</span></a> : Creates a new cloud storage bucket.</li>
<li><a class="reference internal" href="#googlecloudstoragelistoperator"><span class="std std-ref">GoogleCloudStorageListOperator</span></a> : List all objects from the bucket with the give string prefix and delimiter in name.</li>
<li><a class="reference internal" href="#googlecloudstoragedownloadoperator"><span class="std std-ref">GoogleCloudStorageDownloadOperator</span></a> : Downloads a file from Google Cloud Storage.</li>
<li><a class="reference internal" href="#googlecloudstoragetobigqueryoperator"><span class="std std-ref">GoogleCloudStorageToBigQueryOperator</span></a> : Loads files from Google cloud storage into BigQuery.</li>
<li><a class="reference internal" href="#googlecloudstoragetogooglecloudstorageoperator"><span class="std std-ref">GoogleCloudStorageToGoogleCloudStorageOperator</span></a> : Copies objects from a bucket to another, with renaming if requested.</li>
</ul>
<div class="section" id="filetogooglecloudstorageoperator">
<span id="id61"></span><h5 class="sigil_not_in_toc">FileToGoogleCloudStorageOperator</h5>

<pre>
class airflow.contrib.operators.file_to_gcs.FileToGoogleCloudStorageOperator(src, dst, bucket, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, mime_type=&apos;application/octet-stream&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Uploads a file to Google Cloud Storage</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>src</strong> (<em>string</em>) &#x2013; Path to the local file. (templated)</li>
<li><strong>dst</strong> (<em>string</em>) &#x2013; Destination path within the specified bucket. (templated)</li>
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The bucket to upload to. (templated)</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; The Airflow connection ID to upload with</li>
<li><strong>mime_type</strong> (<em>string</em>) &#x2013; The mime-type string</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>
execute(context)</pre>
<p>Uploads the file to Google cloud storage</p>






</div>
<div class="section" id="googlecloudstoragecreatebucketoperator">
<span id="id62"></span><h5 class="sigil_not_in_toc">GoogleCloudStorageCreateBucketOperator</h5>

<pre>
class airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator(bucket_name, storage_class=&apos;MULTI_REGIONAL&apos;, location=&apos;US&apos;, project_id=None, labels=None, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Creates a new bucket. Google Cloud Storage uses a flat namespace,
so you can&#x2019;t create a bucket with a name that is already in use.</p>
<blockquote>
<div><div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more information, see Bucket Naming Guidelines:
<a class="reference external" href="https://cloud.google.com/storage/docs/bucketnaming.html#requirements">https://cloud.google.com/storage/docs/bucketnaming.html#requirements</a></p>
</div>
</div>
</blockquote>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket_name</strong> (<em>string</em>) &#x2013; The name of the bucket. (templated)</li>
<li><strong>storage_class</strong> (<em>string</em>) &#x2013; <p>This defines how objects in the bucket are stored
and determines the SLA and the cost of storage (templated). Values include</p>
<ul>
<li><code class="docutils literal notranslate"><span class="pre">MULTI_REGIONAL</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">REGIONAL</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">STANDARD</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">NEARLINE</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">COLDLINE</span></code>.</li>
</ul>
<p>If this value is not specified when the bucket is
created, it will default to STANDARD.</p>
</li>
<li><strong>location</strong> (<em>string</em>) &#x2013; <p>The location of the bucket. (templated)
Object data for objects in the bucket resides in physical storage
within this region. Defaults to US.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://developers.google.com/storage/docs/bucket-locations">https://developers.google.com/storage/docs/bucket-locations</a></p>
</div>
</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the GCP Project. (templated)</li>
<li><strong>labels</strong> (<em>dict</em>) &#x2013; User-provided labels, in key/value pairs.</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when
connecting to Google cloud storage.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must
have domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>Example:</pre>
<p class="first">The following Operator would create a new bucket <code class="docutils literal notranslate"><span class="pre">test-bucket</span></code>
with <code class="docutils literal notranslate"><span class="pre">MULTI_REGIONAL</span></code> storage class in <code class="docutils literal notranslate"><span class="pre">EU</span></code> region</p>
<div class="last highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CreateBucket</span> <span class="o">=</span> <span class="n">GoogleCloudStorageCreateBucketOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;CreateNewBucket&apos;</span><span class="p">,</span>
    <span class="n">bucket_name</span><span class="o">=</span><span class="s1">&apos;test-bucket&apos;</span><span class="p">,</span>
    <span class="n">storage_class</span><span class="o">=</span><span class="s1">&apos;MULTI_REGIONAL&apos;</span><span class="p">,</span>
    <span class="n">location</span><span class="o">=</span><span class="s1">&apos;EU&apos;</span><span class="p">,</span>
    <span class="n">labels</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;env&apos;</span><span class="p">:</span> <span class="s1">&apos;dev&apos;</span><span class="p">,</span> <span class="s1">&apos;team&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">},</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="s1">&apos;airflow-service-account&apos;</span>
<span class="p">)</span>
</pre>
</div>
</div>





</div>
<div class="section" id="googlecloudstoragedownloadoperator">
<span id="id63"></span><h5 class="sigil_not_in_toc">GoogleCloudStorageDownloadOperator</h5>

<pre>
class airflow.contrib.operators.gcs_download_operator.GoogleCloudStorageDownloadOperator(bucket, object, filename=None, store_to_xcom_key=None, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Downloads a file from Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is. (templated)</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to download in the Google cloud
storage bucket. (templated)</li>
<li><strong>filename</strong> (<em>string</em>) &#x2013; The file path on the local file system (where the
operator is being executed) that the file should be downloaded to. (templated)
If no filename passed, the downloaded data will not be stored on the local file
system.</li>
<li><strong>store_to_xcom_key</strong> (<em>string</em>) &#x2013; If this param is set, the operator will push
the contents of the downloaded file to XCom with the key set in this
parameter. If not set, the downloaded data will not be pushed to XCom. (templated)</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when
connecting to Google cloud storage.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="googlecloudstoragelistoperator">
<span id="id64"></span><h5 class="sigil_not_in_toc">GoogleCloudStorageListOperator</h5>

<pre>
class airflow.contrib.operators.gcs_list_operator.GoogleCloudStorageListOperator(bucket, prefix=None, delimiter=None, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>List all objects from the bucket with the give string prefix and delimiter in name.</p>

<pre>This operator returns a python list with the name of objects which can be used by</pre>
<cite>xcom</cite> in the downstream task.

<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket to find the objects. (templated)</li>
<li><strong>prefix</strong> (<em>string</em>) &#x2013; Prefix string which filters objects whose name begin with
this prefix. (templated)</li>
<li><strong>delimiter</strong> (<em>string</em>) &#x2013; The delimiter by which you want to filter the objects. (templated)
For e.g to lists the CSV files from in a directory in GCS you would use
delimiter=&#x2019;.csv&#x2019;.</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when
connecting to Google cloud storage.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>Example:</pre>
<p class="first">The following Operator would list all the Avro files from <code class="docutils literal notranslate"><span class="pre">sales/sales-2017</span></code>
folder in <code class="docutils literal notranslate"><span class="pre">data</span></code> bucket.</p>
<div class="last highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">GCS_Files</span> <span class="o">=</span> <span class="n">GoogleCloudStorageListOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;GCS_Files&apos;</span><span class="p">,</span>
    <span class="n">bucket</span><span class="o">=</span><span class="s1">&apos;data&apos;</span><span class="p">,</span>
    <span class="n">prefix</span><span class="o">=</span><span class="s1">&apos;sales/sales-2017/&apos;</span><span class="p">,</span>
    <span class="n">delimiter</span><span class="o">=</span><span class="s1">&apos;.avro&apos;</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="n">google_cloud_conn_id</span>
<span class="p">)</span>
</pre>
</div>
</div>





</div>
<div class="section" id="googlecloudstoragetobigqueryoperator">
<span id="id65"></span><h5 class="sigil_not_in_toc">GoogleCloudStorageToBigQueryOperator</h5>

<pre>
class airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator(bucket, source_objects, destination_project_dataset_table, schema_fields=None, schema_object=None, source_format=&apos;CSV&apos;, compression=&apos;NONE&apos;, create_disposition=&apos;CREATE_IF_NEEDED&apos;, skip_leading_rows=0, write_disposition=&apos;WRITE_EMPTY&apos;, field_delimiter=&apos;, &apos;, max_bad_records=0, quote_character=None, ignore_unknown_values=False, allow_quoted_newlines=False, allow_jagged_rows=False, max_id_key=None, bigquery_conn_id=&apos;bigquery_default&apos;, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, schema_update_options=(), src_fmt_configs={}, external_table=False, time_partitioning={}, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Loads files from Google cloud storage into BigQuery.</p>
<p>The schema to be used for the BigQuery table may be specified in one of
two ways. You may either directly pass the schema fields in, or you may
point the operator to a Google cloud storage object name. The object in
Google cloud storage must be a JSON file with the schema fields in it.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The bucket to load from. (templated)</li>
<li><strong>source_objects</strong> &#x2013; List of Google cloud storage URIs to load from. (templated)
If source_format is &#x2018;DATASTORE_BACKUP&#x2019;, the list must only contain a single URI.</li>
<li><strong>destination_project_dataset_table</strong> (<em>string</em>) &#x2013; The dotted (&lt;project&gt;.)&lt;dataset&gt;.&lt;table&gt;
BigQuery table to load data into. If &lt;project&gt; is not included,
project will be the project defined in the connection json. (templated)</li>
<li><strong>schema_fields</strong> (<em>list</em>) &#x2013; If set, the schema field list as defined here:
<a class="reference external" href="https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load">https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load</a>
Should not be set when source_format is &#x2018;DATASTORE_BACKUP&#x2019;.</li>
<li><strong>schema_object</strong> &#x2013; If set, a GCS object path pointing to a .json file that
contains the schema for the table. (templated)</li>
<li><strong>schema_object</strong> &#x2013; string</li>
<li><strong>source_format</strong> (<em>string</em>) &#x2013; File format to export.</li>
<li><strong>compression</strong> (<em>string</em>) &#x2013; [Optional] The compression type of the data source.
Possible values include GZIP and NONE.
The default value is NONE.
This setting is ignored for Google Cloud Bigtable,
Google Cloud Datastore backups and Avro formats.</li>
<li><strong>create_disposition</strong> (<em>string</em>) &#x2013; The create disposition if the table doesn&#x2019;t exist.</li>
<li><strong>skip_leading_rows</strong> (<em>int</em>) &#x2013; Number of rows to skip when loading from a CSV.</li>
<li><strong>write_disposition</strong> (<em>string</em>) &#x2013; The write disposition if the table already exists.</li>
<li><strong>field_delimiter</strong> (<em>string</em>) &#x2013; The delimiter to use when loading from a CSV.</li>
<li><strong>max_bad_records</strong> (<em>int</em>) &#x2013; The maximum number of bad records that BigQuery can
ignore when running the job.</li>
<li><strong>quote_character</strong> (<em>string</em>) &#x2013; The value that is used to quote data sections in a CSV file.</li>
<li><strong>ignore_unknown_values</strong> (<em>bool</em>) &#x2013; [Optional] Indicates if BigQuery should allow
extra values that are not represented in the table schema.
If true, the extra values are ignored. If false, records with extra columns
are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result.</li>
<li><strong>allow_quoted_newlines</strong> (<em>boolean</em>) &#x2013; Whether to allow quoted newlines (true) or not (false).</li>
<li><strong>allow_jagged_rows</strong> (<em>bool</em>) &#x2013; Accept rows that are missing trailing optional columns.
The missing values are treated as nulls. If false, records with missing trailing
columns are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. Only applicable to CSV, ignored
for other formats.</li>
<li><strong>max_id_key</strong> (<em>string</em>) &#x2013; If set, the name of a column in the BigQuery table
that&#x2019;s to be loaded. Thsi will be used to select the MAX value from
BigQuery after the load occurs. The results will be returned by the
execute() command, which in turn gets stored in XCom for future
operators to use. This can be helpful with incremental loads&#x2013;during
future executions, you can pick up from the max ID.</li>
<li><strong>bigquery_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific BigQuery hook.</li>
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; Reference to a specific Google
cloud storage hook.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any. For this to
work, the service account making the request must have domain-wide
delegation enabled.</li>
<li><strong>schema_update_options</strong> (<em>list</em>) &#x2013; Allows the schema of the destination
table to be updated as a side effect of the load job.</li>
<li><strong>src_fmt_configs</strong> (<em>dict</em>) &#x2013; configure optional fields specific to the source format</li>
<li><strong>external_table</strong> (<em>bool</em>) &#x2013; Flag to specify if the destination table should be
a BigQuery external table. Default Value is False.</li>
<li><strong>time_partitioning</strong> (<em>dict</em>) &#x2013; configure optional time partitioning fields i.e.
partition by field, type and  expiration as per API specifications.
Note that &#x2018;field&#x2019; is not available in concurrency with
dataset.table$partition.</li>
</ul>
</td>
</tr>
</tbody>
</table>



</div>
<div class="section" id="googlecloudstoragetogooglecloudstorageoperator">
<span id="id66"></span><h5 class="sigil_not_in_toc">GoogleCloudStorageToGoogleCloudStorageOperator</h5>

<pre>
class airflow.contrib.operators.gcs_to_gcs.GoogleCloudStorageToGoogleCloudStorageOperator(source_bucket, source_object, destination_bucket=None, destination_object=None, move_object=False, google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>
<p>Copies objects from a bucket to another, with renaming if requested.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_bucket</strong> (<em>string</em>) &#x2013; The source Google cloud storage bucket where the
object is. (templated)</li>
<li><strong>source_object</strong> (<em>string</em>) &#x2013; <p>The source name of the object to copy in the Google cloud
storage bucket. (templated)
If wildcards are used in this argument:</p>
<blockquote>
<div>You can use only one wildcard for objects (filenames) within your
bucket. The wildcard can appear inside the object name or at the
end of the object name. Appending a wildcard to the bucket name is
unsupported.</div>
</blockquote>
</li>
<li><strong>destination_bucket</strong> &#x2013; The destination Google cloud storage bucket</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>where the object should be. (templated)
:type destination_bucket: string
:param destination_object: The destination name of the object in the</p>
<blockquote>
<div>destination Google cloud storage bucket. (templated)
If a wildcard is supplied in the source_object argument, this is the
prefix that will be prepended to the final destination objects&#x2019; paths.
Note that the source path&#x2019;s part before the wildcard will be removed;
if it needs to be retained it should be appended to destination_object.
For example, with prefix <code class="docutils literal notranslate"><span class="pre">foo/*</span></code> and destination_object <cite>&#x2018;blah/`</cite>, the
file <code class="docutils literal notranslate"><span class="pre">foo/baz</span></code> will be copied to <code class="docutils literal notranslate"><span class="pre">blah/baz</span></code>; to retain the prefix write
the destination_object as e.g. <code class="docutils literal notranslate"><span class="pre">blah/foo</span></code>, in which case the copied file
will be named <code class="docutils literal notranslate"><span class="pre">blah/foo/baz</span></code>.</div>
</blockquote>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>move_object</strong> &#x2013; When move object is True, the object is moved instead</td>
</tr>
</tbody>
</table>

<pre>of copied to the new location.</pre>
This is the equivalent of a mv command as opposed to a
cp command.

<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>google_cloud_storage_conn_id</strong> (<em>string</em>) &#x2013; The connection ID to use when
connecting to Google cloud storage.</li>
<li><strong>delegate_to</strong> (<em>string</em>) &#x2013; The account to impersonate, if any.
For this to work, the service account making the request must have
domain-wide delegation enabled.</li>
</ul>
</td>
</tr>
</tbody>
</table>

<pre>Examples:</pre>
<p class="first">The following Operator would copy a single file named
<code class="docutils literal notranslate"><span class="pre">sales/sales-2017/january.avro</span></code> in the <code class="docutils literal notranslate"><span class="pre">data</span></code> bucket to the file named
<code class="docutils literal notranslate"><span class="pre">copied_sales/2017/january-backup.avro`</span> <span class="pre">in</span> <span class="pre">the</span> <span class="pre">``data_backup</span></code> bucket</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">copy_single_file</span> <span class="o">=</span> <span class="n">GoogleCloudStorageToGoogleCloudStorageOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;copy_single_file&apos;</span><span class="p">,</span>
    <span class="n">source_bucket</span><span class="o">=</span><span class="s1">&apos;data&apos;</span><span class="p">,</span>
    <span class="n">source_object</span><span class="o">=</span><span class="s1">&apos;sales/sales-2017/january.avro&apos;</span><span class="p">,</span>
    <span class="n">destination_bucket</span><span class="o">=</span><span class="s1">&apos;data_backup&apos;</span><span class="p">,</span>
    <span class="n">destination_object</span><span class="o">=</span><span class="s1">&apos;copied_sales/2017/january-backup.avro&apos;</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="n">google_cloud_conn_id</span>
<span class="p">)</span>
</pre>
</div>
</div>
<p>The following Operator would copy all the Avro files from <code class="docutils literal notranslate"><span class="pre">sales/sales-2017</span></code>
folder (i.e. with names starting with that prefix) in <code class="docutils literal notranslate"><span class="pre">data</span></code> bucket to the
<code class="docutils literal notranslate"><span class="pre">copied_sales/2017</span></code> folder in the <code class="docutils literal notranslate"><span class="pre">data_backup</span></code> bucket.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">copy_files</span> <span class="o">=</span> <span class="n">GoogleCloudStorageToGoogleCloudStorageOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;copy_files&apos;</span><span class="p">,</span>
    <span class="n">source_bucket</span><span class="o">=</span><span class="s1">&apos;data&apos;</span><span class="p">,</span>
    <span class="n">source_object</span><span class="o">=</span><span class="s1">&apos;sales/sales-2017/*.avro&apos;</span><span class="p">,</span>
    <span class="n">destination_bucket</span><span class="o">=</span><span class="s1">&apos;data_backup&apos;</span><span class="p">,</span>
    <span class="n">destination_object</span><span class="o">=</span><span class="s1">&apos;copied_sales/2017/&apos;</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="n">google_cloud_conn_id</span>
<span class="p">)</span>
</pre>
</div>
</div>
<p>The following Operator would move all the Avro files from <code class="docutils literal notranslate"><span class="pre">sales/sales-2017</span></code>
folder (i.e. with names starting with that prefix) in <code class="docutils literal notranslate"><span class="pre">data</span></code> bucket to the
same folder in the <code class="docutils literal notranslate"><span class="pre">data_backup</span></code> bucket, deleting the original files in the
process.</p>
<div class="last highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">move_files</span> <span class="o">=</span> <span class="n">GoogleCloudStorageToGoogleCloudStorageOperator</span><span class="p">(</span>
    <span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;move_files&apos;</span><span class="p">,</span>
    <span class="n">source_bucket</span><span class="o">=</span><span class="s1">&apos;data&apos;</span><span class="p">,</span>
    <span class="n">source_object</span><span class="o">=</span><span class="s1">&apos;sales/sales-2017/*.avro&apos;</span><span class="p">,</span>
    <span class="n">destination_bucket</span><span class="o">=</span><span class="s1">&apos;data_backup&apos;</span><span class="p">,</span>
    <span class="n">move_object</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
    <span class="n">google_cloud_storage_conn_id</span><span class="o">=</span><span class="n">google_cloud_conn_id</span>
<span class="p">)</span>
</pre>
</div>
</div>





</div>
</div>
<div class="section" id="googlecloudstoragehook">
<h4 class="sigil_not_in_toc">GoogleCloudStorageHook</h4>

<pre>
class airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook(google_cloud_storage_conn_id=&apos;google_cloud_default&apos;, delegate_to=None)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook</span></code></a></p>
<p>Interact with Google Cloud Storage. This hook uses the Google Cloud Platform
connection.</p>

<pre>
copy(source_bucket, source_object, destination_bucket=None, destination_object=None)</pre>
<p>Copies an object from a bucket to another, with renaming if requested.</p>
<p>destination_bucket or destination_object can be omitted, in which case
source bucket/object is used, but not both.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_bucket</strong> (<em>string</em>) &#x2013; The bucket of the object to copy from.</li>
<li><strong>source_object</strong> (<em>string</em>) &#x2013; The object to copy.</li>
<li><strong>destination_bucket</strong> (<em>string</em>) &#x2013; The destination of the object to copied to.
Can be omitted; then the same bucket is used.</li>
<li><strong>destination_object</strong> &#x2013; The (renamed) path of the object if given.
Can be omitted; then the same name is used.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
create_bucket(bucket_name, storage_class=&apos;MULTI_REGIONAL&apos;, location=&apos;US&apos;, project_id=None, labels=None)</pre>
<p>Creates a new bucket. Google Cloud Storage uses a flat namespace, so
you can&#x2019;t create a bucket with a name that is already in use.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last">For more information, see Bucket Naming Guidelines:
<a class="reference external" href="https://cloud.google.com/storage/docs/bucketnaming.html#requirements">https://cloud.google.com/storage/docs/bucketnaming.html#requirements</a></p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>bucket_name</strong> (<em>string</em>) &#x2013; The name of the bucket.</li>
<li><strong>storage_class</strong> (<em>string</em>) &#x2013; <p>This defines how objects in the bucket are stored
and determines the SLA and the cost of storage. Values include</p>
<ul>
<li><code class="docutils literal notranslate"><span class="pre">MULTI_REGIONAL</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">REGIONAL</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">STANDARD</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">NEARLINE</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">COLDLINE</span></code>.</li>
</ul>
<p>If this value is not specified when the bucket is
created, it will default to STANDARD.</p>
</li>
<li><strong>location</strong> (<em>string</em>) &#x2013; <p>The location of the bucket.
Object data for objects in the bucket resides in physical storage
within this region. Defaults to US.</p>
<div class="admonition seealso">
<p class="first admonition-title">See also</p>
<p class="last"><a class="reference external" href="https://developers.google.com/storage/docs/bucket-locations">https://developers.google.com/storage/docs/bucket-locations</a></p>
</div>
</li>
<li><strong>project_id</strong> (<em>string</em>) &#x2013; The ID of the GCP Project.</li>
<li><strong>labels</strong> (<em>dict</em>) &#x2013; User-provided labels, in key/value pairs.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">If successful, it returns the <code class="docutils literal notranslate"><span class="pre">id</span></code> of the bucket.</p>
</td>
</tr>
</tbody>
</table>




<pre>
delete(bucket, object, generation=None)</pre>
<p>Delete an object if versioning is not enabled for the bucket, or if generation
parameter is used.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; name of the bucket, where the object resides</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; name of the object to delete</li>
<li><strong>generation</strong> (<em>string</em>) &#x2013; if present, permanently delete the object of this generation</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">True if succeeded</p>
</td>
</tr>
</tbody>
</table>




<pre>
download(bucket, object, filename=None)</pre>
<p>Get a file from Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The bucket to fetch from.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The object to fetch.</li>
<li><strong>filename</strong> (<em>string</em>) &#x2013; If set, a local file path where the file should be written to.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
exists(bucket, object)</pre>
<p>Checks for the existence of a file in Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to check in the Google cloud
storage bucket.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_conn()</pre>
<p>Returns a Google Cloud Storage service object.</p>




<pre>
get_crc32c(bucket, object)</pre>
<p>Gets the CRC32c checksum of an object in Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to check in the Google cloud
storage bucket.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_md5hash(bucket, object)</pre>
<p>Gets the MD5 hash of an object in Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to check in the Google cloud
storage bucket.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
get_size(bucket, object)</pre>
<p>Gets the size of a file in Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to check in the Google cloud storage bucket.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
is_updated_after(bucket, object, ts)</pre>
<p>Checks if an object is updated in Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The Google cloud storage bucket where the object is.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The name of the object to check in the Google cloud
storage bucket.</li>
<li><strong>ts</strong> (<em>datetime</em>) &#x2013; The timestamp to check against.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
list(bucket, versions=None, maxResults=None, prefix=None, delimiter=None)</pre>
<p>List all objects from the bucket with the give string prefix in name</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; bucket name</li>
<li><strong>versions</strong> (<em>boolean</em>) &#x2013; if true, list all versions of the objects</li>
<li><strong>maxResults</strong> (<em>integer</em>) &#x2013; max count of items to return in a single page of responses</li>
<li><strong>prefix</strong> (<em>string</em>) &#x2013; prefix string which filters objects whose name begin with
this prefix</li>
<li><strong>delimiter</strong> (<em>string</em>) &#x2013; filters objects based on the delimiter (for e.g &#x2018;.csv&#x2019;)</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">a stream of object names matching the filtering criteria</p>
</td>
</tr>
</tbody>
</table>




<pre>
rewrite(source_bucket, source_object, destination_bucket, destination_object=None)</pre>
<p>Has the same functionality as copy, except that will work on files
over 5 TB, as well as when copying between locations and/or storage
classes.</p>
<p>destination_object can be omitted, in which case source_object is used.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>source_bucket</strong> (<em>string</em>) &#x2013; The bucket of the object to copy from.</li>
<li><strong>source_object</strong> (<em>string</em>) &#x2013; The object to copy.</li>
<li><strong>destination_bucket</strong> (<em>string</em>) &#x2013; The destination of the object to copied to.</li>
<li><strong>destination_object</strong> &#x2013; The (renamed) path of the object if given.
Can be omitted; then the same name is used.</li>
</ul>
</td>
</tr>
</tbody>
</table>




<pre>
upload(bucket, object, filename, mime_type=&apos;application/octet-stream&apos;)</pre>
<p>Uploads a local file to Google Cloud Storage.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first last simple">
<li><strong>bucket</strong> (<em>string</em>) &#x2013; The bucket to upload to.</li>
<li><strong>object</strong> (<em>string</em>) &#x2013; The object name to set when uploading the local file.</li>
<li><strong>filename</strong> (<em>string</em>) &#x2013; The local file path to the file to be uploaded.</li>
<li><strong>mime_type</strong> (<em>string</em>) &#x2013; The MIME type to set when uploading the file.</li>
</ul>
</td>
</tr>
</tbody>
</table>






</div>
</div>
<div class="section" id="google-kubernetes-engine">
<h3 class="sigil_not_in_toc">Google Kubernetes Engine</h3>
<div class="section" id="google-kubernetes-engine-cluster-operators">
<h4 class="sigil_not_in_toc">Google Kubernetes Engine Cluster Operators</h4>
<ul class="simple">
<li><a class="reference internal" href="#id67"><span class="std std-ref">GKEClusterDeleteOperator</span></a> : Creates a Kubernetes Cluster in Google Cloud Platform</li>
<li><a class="reference internal" href="#id68"><span class="std std-ref">GKEPodOperator</span></a> : Deletes a Kubernetes Cluster in Google Cloud Platform</li>
</ul>
<div class="section" id="gkeclustercreateoperator">
<h5 class="sigil_not_in_toc">GKEClusterCreateOperator</h5>

<pre>
class airflow.contrib.operators.gcp_container_operator.GKEClusterCreateOperator(project_id, location, body={}, gcp_conn_id=&apos;google_cloud_default&apos;, api_version=&apos;v2&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>



</div>
<div class="section" id="gkeclusterdeleteoperator">
<span id="id67"></span><h5 class="sigil_not_in_toc">GKEClusterDeleteOperator</h5>

<pre>
class airflow.contrib.operators.gcp_container_operator.GKEClusterDeleteOperator(project_id, name, location, gcp_conn_id=&apos;google_cloud_default&apos;, api_version=&apos;v2&apos;, *args, **kwargs)</pre>
<p>Bases: <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a></p>



</div>
<div class="section" id="gkepodoperator">
<span id="id68"></span><h5 class="sigil_not_in_toc">GKEPodOperator</h5>
</div>
</div>
<div class="section" id="google-kubernetes-engine-hook">
<span id="id69"></span><h4 class="sigil_not_in_toc">Google Kubernetes Engine Hook</h4>

<pre>
class airflow.contrib.hooks.gcp_container_hook.GKEClusterHook(project_id, location)</pre>
<p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.hooks.base_hook.BaseHook</span></code></p>

<pre>
create_cluster(cluster, retry=&lt;object object&gt;, timeout=&lt;object object&gt;)</pre>
<p>Creates a cluster, consisting of the specified number and type of Google Compute
Engine instances.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>cluster</strong> (<em>dict</em><em> or </em><em>google.cloud.container_v1.types.Cluster</em>) &#x2013; A Cluster protobuf or dict. If dict is provided, it must be of
the same form as the protobuf message google.cloud.container_v1.types.Cluster</li>
<li><strong>retry</strong> (<em>google.api_core.retry.Retry</em>) &#x2013; A retry object (google.api_core.retry.Retry) used to retry requests.
If None is specified, requests will not be retried.</li>
<li><strong>timeout</strong> (<em>float</em>) &#x2013; The amount of time, in seconds, to wait for the request to
complete. Note that if retry is specified, the timeout applies to each
individual attempt.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">The full url to the new, or existing, cluster</p>
</td>
</tr>
</tbody>
</table>

<pre>:raises</pre>
ParseError: On JSON parsing problems when trying to convert dict
AirflowException: cluster is not dict type nor Cluster proto type





<pre>
delete_cluster(name, retry=&lt;object object&gt;, timeout=&lt;object object&gt;)</pre>
<p>Deletes the cluster, including the Kubernetes endpoint and all
worker nodes. Firewalls and routes that were configured during
cluster creation are also deleted. Other Google Compute Engine
resources that might be in use by the cluster (e.g. load balancer
resources) will not be deleted if they weren&#x2019;t present at the
initial create time.</p>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><ul class="first simple">
<li><strong>name</strong> (<em>str</em>) &#x2013; The name of the cluster to delete</li>
<li><strong>retry</strong> (<em>google.api_core.retry.Retry</em>) &#x2013; Retry object used to determine when/if to retry requests.
If None is specified, requests will not be retried.</li>
<li><strong>timeout</strong> (<em>float</em>) &#x2013; The amount of time, in seconds, to wait for the request to
complete. Note that if retry is specified, the timeout applies to each
individual attempt.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body"><p class="first last">The full url to the delete operation if successful, else None</p>
</td>
</tr>
</tbody>
</table>




<pre>
get_cluster(name, retry=&lt;object object&gt;, timeout=&lt;object object&gt;)</pre>
<p>Gets details of specified cluster
:param name: The name of the cluster to retrieve
:type name: str
:param retry: A retry object used to retry requests. If None is specified,</p>
<blockquote>
<div>requests will not be retried.</div>
</blockquote>
<table class="docutils field-list" frame="void" rules="none">
<colgroup><col class="field-name">
<col class="field-body">
</colgroup>
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th>
<td class="field-body"><strong>timeout</strong> (<em>float</em>) &#x2013; The amount of time, in seconds, to wait for the request to
complete. Note that if retry is specified, the timeout applies to each
individual attempt.</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th>
<td class="field-body">A google.cloud.container_v1.types.Cluster instance</td>
</tr>
</tbody>
</table>




<pre>
get_operation(operation_name)</pre>
<p>Fetches the operation from Google Cloud
:param operation_name: Name of operation to fetch
:type operation_name: str
:return: The new, updated operation from Google Cloud</p>




<pre>
wait_for_operation(operation)</pre>
<p>Given an operation, continuously fetches the status from Google Cloud until either
completion or an error occurring
:param operation: The Operation to wait for
:type operation: A google.cloud.container_V1.gapic.enums.Operator
:return: A new, updated operation fetched from Google Cloud</p>






</div>
</div>
</div>
</body>
</html>